Overview

Dataset statistics

Number of variables36
Number of observations179
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory547.4 KiB
Average record size in memory3.1 KiB

Variable types

Numeric14
Categorical19
Unsupported3

Alerts

lang has constant value "en" Constant
LAW has constant value "0" Constant
MONEY has constant value "0" Constant
Name has a high cardinality: 179 distinct values High cardinality
Description has a high cardinality: 179 distinct values High cardinality
Description_clean has a high cardinality: 179 distinct values High cardinality
word_count is highly correlated with char_count and 1 other fieldsHigh correlation
char_count is highly correlated with word_count and 1 other fieldsHigh correlation
sentence_count is highly correlated with word_count and 1 other fieldsHigh correlation
name_word_count is highly correlated with name_char_countHigh correlation
name_char_count is highly correlated with name_word_countHigh correlation
word_count is highly correlated with char_count and 1 other fieldsHigh correlation
char_count is highly correlated with word_count and 1 other fieldsHigh correlation
sentence_count is highly correlated with word_count and 2 other fieldsHigh correlation
name_word_count is highly correlated with name_char_countHigh correlation
name_char_count is highly correlated with name_word_countHigh correlation
CARDINAL is highly correlated with sentence_countHigh correlation
DATE is highly correlated with ORGHigh correlation
ORG is highly correlated with DATEHigh correlation
word_count is highly correlated with char_count and 1 other fieldsHigh correlation
char_count is highly correlated with word_count and 1 other fieldsHigh correlation
sentence_count is highly correlated with word_count and 1 other fieldsHigh correlation
name_word_count is highly correlated with name_char_countHigh correlation
name_char_count is highly correlated with name_word_countHigh correlation
ORDINAL is highly correlated with LAW and 2 other fieldsHigh correlation
PERCENT is highly correlated with LAW and 3 other fieldsHigh correlation
LAW is highly correlated with ORDINAL and 14 other fieldsHigh correlation
LOC is highly correlated with LAW and 2 other fieldsHigh correlation
Type is highly correlated with LAW and 2 other fieldsHigh correlation
lang is highly correlated with ORDINAL and 14 other fieldsHigh correlation
PRODUCT is highly correlated with LAW and 2 other fieldsHigh correlation
FAC is highly correlated with LAW and 2 other fieldsHigh correlation
QUANTITY is highly correlated with PERCENT and 3 other fieldsHigh correlation
WORK_OF_ART is highly correlated with LAW and 2 other fieldsHigh correlation
MONEY is highly correlated with ORDINAL and 14 other fieldsHigh correlation
NORP is highly correlated with LAW and 2 other fieldsHigh correlation
TIME is highly correlated with LAW and 2 other fieldsHigh correlation
name_word_count is highly correlated with LAW and 2 other fieldsHigh correlation
LANGUAGE is highly correlated with LAW and 2 other fieldsHigh correlation
EVENT is highly correlated with LAW and 2 other fieldsHigh correlation
df_index is highly correlated with TypeHigh correlation
Type is highly correlated with df_index and 1 other fieldsHigh correlation
word_count is highly correlated with char_count and 7 other fieldsHigh correlation
char_count is highly correlated with word_count and 7 other fieldsHigh correlation
sentence_count is highly correlated with word_count and 5 other fieldsHigh correlation
avg_sentence_length is highly correlated with word_count and 2 other fieldsHigh correlation
name_word_count is highly correlated with name_char_count and 1 other fieldsHigh correlation
name_char_count is highly correlated with name_word_count and 1 other fieldsHigh correlation
name_avg_word_length is highly correlated with Type and 2 other fieldsHigh correlation
Polarity is highly correlated with avg_sentence_lengthHigh correlation
CARDINAL is highly correlated with word_count and 5 other fieldsHigh correlation
DATE is highly correlated with word_count and 10 other fieldsHigh correlation
EVENT is highly correlated with PERSONHigh correlation
FAC is highly correlated with DATEHigh correlation
GPE is highly correlated with word_count and 2 other fieldsHigh correlation
LOC is highly correlated with DATEHigh correlation
NORP is highly correlated with PERSONHigh correlation
ORDINAL is highly correlated with CARDINAL and 1 other fieldsHigh correlation
ORG is highly correlated with sentence_count and 3 other fieldsHigh correlation
PERCENT is highly correlated with DATE and 1 other fieldsHigh correlation
PERSON is highly correlated with word_count and 4 other fieldsHigh correlation
QUANTITY is highly correlated with word_count and 3 other fieldsHigh correlation
WORK_OF_ART is highly correlated with DATE and 2 other fieldsHigh correlation
Name is uniformly distributed Uniform
Description is uniformly distributed Uniform
Description_clean is uniformly distributed Uniform
Name has unique values Unique
Description has unique values Unique
Description_clean has unique values Unique
parsed is an unsupported type, check if it needs cleaning or further analysis Unsupported
entity_tags is an unsupported type, check if it needs cleaning or further analysis Unsupported
entity_types is an unsupported type, check if it needs cleaning or further analysis Unsupported
Polarity has 17 (9.5%) zeros Zeros
CARDINAL has 92 (51.4%) zeros Zeros
DATE has 121 (67.6%) zeros Zeros
GPE has 134 (74.9%) zeros Zeros
ORG has 88 (49.2%) zeros Zeros
PERSON has 76 (42.5%) zeros Zeros

Reproduction

Analysis started2022-05-09 09:04:43.434693
Analysis finished2022-05-09 09:05:11.467447
Duration28.03 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION

Distinct126
Distinct (%)70.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.98882682
Minimum0
Maximum125
Zeros1
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum0
5-th percentile4.9
Q122.5
median45
Q380.5
95-th percentile116.1
Maximum125
Range125
Interquartile range (IQR)58

Descriptive statistics

Standard deviation35.64068323
Coefficient of variation (CV)0.6855450568
Kurtosis-0.9579792152
Mean51.98882682
Median Absolute Deviation (MAD)27
Skewness0.4575273268
Sum9306
Variance1270.258301
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
442
 
1.1%
412
 
1.1%
292
 
1.1%
302
 
1.1%
312
 
1.1%
322
 
1.1%
332
 
1.1%
342
 
1.1%
362
 
1.1%
372
 
1.1%
Other values (116)159
88.8%
ValueCountFrequency (%)
01
0.6%
12
1.1%
22
1.1%
32
1.1%
42
1.1%
52
1.1%
62
1.1%
72
1.1%
82
1.1%
92
1.1%
ValueCountFrequency (%)
1251
0.6%
1241
0.6%
1231
0.6%
1221
0.6%
1211
0.6%
1201
0.6%
1191
0.6%
1181
0.6%
1171
0.6%
1161
0.6%

Name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct179
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size13.4 KiB
Acinetobacter baumannii
 
1
Bacteriophage φCb5
 
1
Streptococcus sobrinus
 
1
Treponema
 
1
Ureaplasma urealyticum
 
1
Other values (174)
174 

Length

Max length32
Median length25
Mean length17.91061453
Min length6

Characters and Unicode

Total characters3206
Distinct characters65
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique179 ?
Unique (%)100.0%

Sample

1st rowAcinetobacter baumannii
2nd rowActinomyces israelii
3rd rowAgrobacterium tumefaciens
4th rowAnaplasma
5th rowAnaplasma phagocytophilum

Common Values

ValueCountFrequency (%)
Acinetobacter baumannii1
 
0.6%
Bacteriophage φCb51
 
0.6%
Streptococcus sobrinus1
 
0.6%
Treponema1
 
0.6%
Ureaplasma urealyticum1
 
0.6%
Vibrio1
 
0.6%
Vibrio cholerae1
 
0.6%
Vibrio parahaemolyticus1
 
0.6%
Vibrio vulnificus1
 
0.6%
Wolbachia1
 
0.6%
Other values (169)169
94.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
streptococcus12
 
3.7%
bacteriophage10
 
3.0%
bacillus9
 
2.7%
mycobacterium7
 
2.1%
virus7
 
2.1%
enterococcus6
 
1.8%
mycoplasma6
 
1.8%
phage5
 
1.5%
campylobacter4
 
1.2%
haemophilus4
 
1.2%
Other values (216)258
78.7%

Most occurring characters

ValueCountFrequency (%)
e280
 
8.7%
a275
 
8.6%
i266
 
8.3%
o226
 
7.0%
c224
 
7.0%
r201
 
6.3%
s198
 
6.2%
t174
 
5.4%
l165
 
5.1%
u149
 
4.6%
Other values (55)1048
32.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2808
87.6%
Uppercase Letter214
 
6.7%
Space Separator149
 
4.6%
Decimal Number27
 
0.8%
Close Punctuation3
 
0.1%
Open Punctuation3
 
0.1%
Dash Punctuation2
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e280
10.0%
a275
9.8%
i266
9.5%
o226
 
8.0%
c224
 
8.0%
r201
 
7.2%
s198
 
7.1%
t174
 
6.2%
l165
 
5.9%
u149
 
5.3%
Other values (17)650
23.1%
Uppercase Letter
ValueCountFrequency (%)
B35
16.4%
S23
10.7%
C23
10.7%
P21
9.8%
M20
9.3%
E13
 
6.1%
L12
 
5.6%
A12
 
5.6%
T9
 
4.2%
V7
 
3.3%
Other values (14)39
18.2%
Decimal Number
ValueCountFrequency (%)
07
25.9%
16
22.2%
25
18.5%
54
14.8%
71
 
3.7%
91
 
3.7%
41
 
3.7%
31
 
3.7%
81
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
-1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
149
100.0%
Close Punctuation
ValueCountFrequency (%)
)3
100.0%
Open Punctuation
ValueCountFrequency (%)
(3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3019
94.2%
Common184
 
5.7%
Greek3
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e280
 
9.3%
a275
 
9.1%
i266
 
8.8%
o226
 
7.5%
c224
 
7.4%
r201
 
6.7%
s198
 
6.6%
t174
 
5.8%
l165
 
5.5%
u149
 
4.9%
Other values (39)861
28.5%
Common
ValueCountFrequency (%)
149
81.0%
07
 
3.8%
16
 
3.3%
25
 
2.7%
54
 
2.2%
)3
 
1.6%
(3
 
1.6%
-1
 
0.5%
71
 
0.5%
91
 
0.5%
Other values (4)4
 
2.2%
Greek
ValueCountFrequency (%)
φ2
66.7%
Φ1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3202
99.9%
None3
 
0.1%
Punctuation1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e280
 
8.7%
a275
 
8.6%
i266
 
8.3%
o226
 
7.1%
c224
 
7.0%
r201
 
6.3%
s198
 
6.2%
t174
 
5.4%
l165
 
5.2%
u149
 
4.7%
Other values (52)1044
32.6%
None
ValueCountFrequency (%)
φ2
66.7%
Φ1
33.3%
Punctuation
ValueCountFrequency (%)
1
100.0%

Description
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct179
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size227.0 KiB
Acinetobacter baumannii is a typically short, almost round, rod-shaped (coccobacillus) Gram-negative bacterium. It is named after the bacteriologist Paul Baumann. It can be an opportunistic pathogen in humans, affecting people with compromised immune systems, and is becoming increasingly important as a hospital-derived (nosocomial) infection. While other species of the genus Acinetobacter are often found in soil samples (leading to the common misconception that A. baumannii is a soil organism, too), it is almost exclusively isolated from hospital environments. Although occasionally it has been found in environmental soil and water samples, its natural habitat is still not known. Bacteria of this genus lack flagella, whip-like structures many bacteria use for locomotion, but exhibit twitching or swarming motility. This may be due to the activity of type IV pili, pole-like structures that can be extended and retracted. Motility in A. baumannii may also be due to the excretion of exopolysaccharide, creating a film of high-molecular-weight sugar chains behind the bacterium to move forward. Clinical microbiologists typically differentiate members of the genus Acinetobacter from other Moraxellaceae by performing an oxidase test, as Acinetobacter spp. are the only members of the Moraxellaceae to lack cytochrome c oxidases.A. baumannii is part of the ACB complex (A. baumannii, A. calcoaceticus, and Acinetobacter genomic species 13TU). It is difficult to determine the specific species of members of the ACB complex and they comprise the most clinically relevant members of the genus. A. baumannii has also been identified as an ESKAPE pathogen (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), a group of pathogens with a high rate of antibiotic resistance that are responsible for the majority of nosocomial infections.Colloquially, A. baumannii is referred to as "Iraqibacter" due to its seemingly sudden emergence in military treatment facilities during the Iraq War. It has continued to be an issue for veterans and soldiers who served in Iraq and Afghanistan. Multidrug-resistant A. baumannii has spread to civilian hospitals in part due to the transport of infected soldiers through multiple medical facilities. During the COVID-19 pandemic, coinfection with A. baumannii secondary to SARS-CoV-2 infections has been reported multiple times in literature.
 
1
Bacteriophage φCb5 is a bacteriophage that infects Caulobacter bacteria and other caulobacteria. The bacteriophage was discovered in 1970, it belongs to the genus Cebevirus of the Steitzviridae family and is the type species of the family. The bacteriophage is widely distributed in the soil, freshwater lakes, streams and seawater, places where caulobacteria inhabit and can be sensitive to salinity.
 
1
Streptococcus sobrinus is a Gram-positive, catalase-negative, non-motile, and anaerobic member of the genus Streptococcus.
 
1
Treponema is a genus of spiral-shaped bacteria. The major treponeme species of human pathogens is Treponema pallidum, whose subspecies are responsible for diseases such as syphilis, bejel, and yaws. Treponema carateum is the cause of pinta. Treponema paraluiscuniculi is associated with syphilis in rabbits. Treponema succinifaciens has been found in the gut microbiome of traditional rural human populations.
 
1
Ureaplasma urealyticum is a bacterium belonging to the genus Ureaplasma and the family Mycoplasmataceae in the order Mycoplasmatales. This family consists of the genera Mycoplasma and Ureaplasma. Its type strain is T960. There are two known biovars of this species; T960 and 27. These strains of bacterium are commonly found in the urogenital tracts of human beings, but overgrowth can lead to infections that cause the patient discomfort. Unlike most bacteria, Ureaplasma urealyticum lacks a cell wall making it unique in physiology and medical treatment.
 
1
Other values (174)
174 

Length

Max length3608
Median length742
Mean length876.9497207
Min length72

Characters and Unicode

Total characters156974
Distinct characters122
Distinct categories13 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique179 ?
Unique (%)100.0%

Sample

1st rowAcinetobacter baumannii is a typically short, almost round, rod-shaped (coccobacillus) Gram-negative bacterium. It is named after the bacteriologist Paul Baumann. It can be an opportunistic pathogen in humans, affecting people with compromised immune systems, and is becoming increasingly important as a hospital-derived (nosocomial) infection. While other species of the genus Acinetobacter are often found in soil samples (leading to the common misconception that A. baumannii is a soil organism, too), it is almost exclusively isolated from hospital environments. Although occasionally it has been found in environmental soil and water samples, its natural habitat is still not known. Bacteria of this genus lack flagella, whip-like structures many bacteria use for locomotion, but exhibit twitching or swarming motility. This may be due to the activity of type IV pili, pole-like structures that can be extended and retracted. Motility in A. baumannii may also be due to the excretion of exopolysaccharide, creating a film of high-molecular-weight sugar chains behind the bacterium to move forward. Clinical microbiologists typically differentiate members of the genus Acinetobacter from other Moraxellaceae by performing an oxidase test, as Acinetobacter spp. are the only members of the Moraxellaceae to lack cytochrome c oxidases.A. baumannii is part of the ACB complex (A. baumannii, A. calcoaceticus, and Acinetobacter genomic species 13TU). It is difficult to determine the specific species of members of the ACB complex and they comprise the most clinically relevant members of the genus. A. baumannii has also been identified as an ESKAPE pathogen (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), a group of pathogens with a high rate of antibiotic resistance that are responsible for the majority of nosocomial infections.Colloquially, A. baumannii is referred to as "Iraqibacter" due to its seemingly sudden emergence in military treatment facilities during the Iraq War. It has continued to be an issue for veterans and soldiers who served in Iraq and Afghanistan. Multidrug-resistant A. baumannii has spread to civilian hospitals in part due to the transport of infected soldiers through multiple medical facilities. During the COVID-19 pandemic, coinfection with A. baumannii secondary to SARS-CoV-2 infections has been reported multiple times in literature.
2nd rowActinomyces israelii is a species of Gram-positive, rod-shaped bacteria within the genus Actinomyces. Known to live commensally on and within humans, A. israelii is an opportunistic pathogen and a cause of actinomycosis. Many physiologically diverse strains of the species are known to exist, though not all are strict anaerobes. It was named after the German surgeon James Adolf Israel (1848–1926), who studied the organism for the first time in 1878.
3rd rowAgrobacterium radiobacter (more commonly known as Agrobacterium tumefaciens) is the causal agent of crown gall disease (the formation of tumours) in over 140 species of eudicots. It is a rod-shaped, Gram-negative soil bacterium. Symptoms are caused by the insertion of a small segment of DNA (known as the T-DNA, for 'transfer DNA', not to be confused with tRNA that transfers amino acids during protein synthesis), from a plasmid into the plant cell, which is incorporated at a semi-random location into the plant genome. Plant genomes can be engineered by use of Agrobacterium for the delivery of sequences hosted in T-DNA binary vectors. Agrobacterium tumefaciens is an alphaproteobacterium of the family Rhizobiaceae, which includes the nitrogen-fixing legume symbionts. Unlike the nitrogen-fixing symbionts, tumor-producing Agrobacterium species are pathogenic and do not benefit the plant. The wide variety of plants affected by Agrobacterium makes it of great concern to the agriculture industry.Economically, A. tumefaciens is a serious pathogen of walnuts, grape vines, stone fruits, nut trees, sugar beets, horse radish, and rhubarb, and the persistent nature of the tumors or galls caused by the disease make it particularly harmful for perennial crops.Agrobacterium tumefaciens grows optimally at 28 °C. The doubling time can range from 2.5–4h depending on the media, culture format, and level of aeration. At temperatures above 30 °C, A. tumefaciens begins to experience heat shock which is likely to result in errors in cell division.
4th rowAnaplasma is a genus of bacteria of the alphaproteobacterial order Rickettsiales, family Anaplasmataceae. Anaplasma species reside in host blood cells and lead to the disease anaplasmosis. The disease most commonly occurs in areas where competent tick vectors are indigenous, including tropical and semitropical areas of the world for intraerythrocytic Anaplasma spp.Anaplasma species are biologically transmitted by Ixodes deer-tick vectors, and the prototypical species, A. marginale, can be mechanically transmitted by biting flies and iatrogenically with blood-contaminated instruments. One of the major consequences of infection by bovine red blood cells by A. marginale is the development of nonhaemolytic anaemia, thus the absence of hemoglobinuria, which allows clinical differentiation from another major tick-borne disease, bovine babesiosis, caused by Babesia bigemina.Species of veterinary interest include: Anaplasma marginale and Anaplasma centrale in cattle Anaplasma ovis and Anaplasma mesaeterum in sheep and goats Anaplasma phagocytophilum in dogs, cats, and horses (see human granulocytic anaplasmosis) Anaplasma platys in dogs
5th rowAnaplasma phagocytophilum (formerly Ehrlichia phagocytophilum) is a Gram-negative bacterium that is unusual in its tropism to neutrophils. It causes anaplasmosis in sheep and cattle, also known as tick-borne fever and pasture fever, and also causes the zoonotic disease human granulocytic anaplasmosis.A. phagocytophilum is a Gram-negative, obligate bacterium of neutrophils. It causes human granulocytic anaplasmosis, which is a tick-borne rickettsial disease. Because this bacterium invades neutrophils, it has a unique adaptation and pathogenetic mechanism.

Common Values

ValueCountFrequency (%)
Acinetobacter baumannii is a typically short, almost round, rod-shaped (coccobacillus) Gram-negative bacterium. It is named after the bacteriologist Paul Baumann. It can be an opportunistic pathogen in humans, affecting people with compromised immune systems, and is becoming increasingly important as a hospital-derived (nosocomial) infection. While other species of the genus Acinetobacter are often found in soil samples (leading to the common misconception that A. baumannii is a soil organism, too), it is almost exclusively isolated from hospital environments. Although occasionally it has been found in environmental soil and water samples, its natural habitat is still not known. Bacteria of this genus lack flagella, whip-like structures many bacteria use for locomotion, but exhibit twitching or swarming motility. This may be due to the activity of type IV pili, pole-like structures that can be extended and retracted. Motility in A. baumannii may also be due to the excretion of exopolysaccharide, creating a film of high-molecular-weight sugar chains behind the bacterium to move forward. Clinical microbiologists typically differentiate members of the genus Acinetobacter from other Moraxellaceae by performing an oxidase test, as Acinetobacter spp. are the only members of the Moraxellaceae to lack cytochrome c oxidases.A. baumannii is part of the ACB complex (A. baumannii, A. calcoaceticus, and Acinetobacter genomic species 13TU). It is difficult to determine the specific species of members of the ACB complex and they comprise the most clinically relevant members of the genus. A. baumannii has also been identified as an ESKAPE pathogen (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), a group of pathogens with a high rate of antibiotic resistance that are responsible for the majority of nosocomial infections.Colloquially, A. baumannii is referred to as "Iraqibacter" due to its seemingly sudden emergence in military treatment facilities during the Iraq War. It has continued to be an issue for veterans and soldiers who served in Iraq and Afghanistan. Multidrug-resistant A. baumannii has spread to civilian hospitals in part due to the transport of infected soldiers through multiple medical facilities. During the COVID-19 pandemic, coinfection with A. baumannii secondary to SARS-CoV-2 infections has been reported multiple times in literature.1
 
0.6%
Bacteriophage φCb5 is a bacteriophage that infects Caulobacter bacteria and other caulobacteria. The bacteriophage was discovered in 1970, it belongs to the genus Cebevirus of the Steitzviridae family and is the type species of the family. The bacteriophage is widely distributed in the soil, freshwater lakes, streams and seawater, places where caulobacteria inhabit and can be sensitive to salinity.1
 
0.6%
Streptococcus sobrinus is a Gram-positive, catalase-negative, non-motile, and anaerobic member of the genus Streptococcus.1
 
0.6%
Treponema is a genus of spiral-shaped bacteria. The major treponeme species of human pathogens is Treponema pallidum, whose subspecies are responsible for diseases such as syphilis, bejel, and yaws. Treponema carateum is the cause of pinta. Treponema paraluiscuniculi is associated with syphilis in rabbits. Treponema succinifaciens has been found in the gut microbiome of traditional rural human populations.1
 
0.6%
Ureaplasma urealyticum is a bacterium belonging to the genus Ureaplasma and the family Mycoplasmataceae in the order Mycoplasmatales. This family consists of the genera Mycoplasma and Ureaplasma. Its type strain is T960. There are two known biovars of this species; T960 and 27. These strains of bacterium are commonly found in the urogenital tracts of human beings, but overgrowth can lead to infections that cause the patient discomfort. Unlike most bacteria, Ureaplasma urealyticum lacks a cell wall making it unique in physiology and medical treatment.1
 
0.6%
Vibrio is a genus of Gram-negative bacteria, possessing a curved-rod (comma) shape, several species of which can cause foodborne infection, usually associated with eating undercooked seafood. Typically found in salt water, Vibrio species are facultative anaerobes that test positive for oxidase and do not form spores. All members of the genus are motile. They are able to have polar or lateral flagellum with or without sheaths. Vibrio species typically possess two chromosomes, which is unusual for bacteria. Each chromosome has a distinct and independent origin of replication, and are conserved together over time in the genus. Recent phylogenies have been constructed based on a suite of genes (multilocus sequence analysis).O. F. Müller (1773, 1786) described eight species of the genus Vibrio (included in Infusoria), three of which were spirilliforms. Some of the other species are today assigned to eukaryote taxa, e.g., to the euglenoid Peranema or to the diatom Bacillaria. However, Vibrio Müller, 1773 became regarded as the name of a zoological genus, and the name of the bacterial genus became Vibrio Pacini, 1854. Filippo Pacini isolated micro-organisms he called "vibrions" from cholera patients in 1854, because of their motility. In Latin "vibrio" means "to quiver".Vibrio spp. are commonly found in marine environments. Marine Vibrio species are highly salt tolerant and can grow in wide range of salinity. S.I. Paul et al. (2021) isolated, characterized, and identified multiple strains of Vibrio species (Vibrio alginolyticus, Vibrio natriegens, Vibrio pelagius, Vibrio azureus) from marine sponges of the Saint Martin's Island Area of the Bay of Bengal, Bangladesh. Where, Vibrio species were found most dominant bacteria in marine environment.1
 
0.6%
Vibrio cholerae is a species of Gram-negative, facultative anaerobe and comma-shaped bacteria. The bacteria naturally live in brackish or saltwater where they attach themselves easily to the chitin-containing shells of crabs, shrimps, and other shellfish. Some strains of V. cholerae are pathogenic to humans and cause a deadly disease cholera, which can be derived from the consumption of undercooked or raw marine life species.V. cholerae was first described by Félix-Archimède Pouchet in 1849 as some kind of protozoa. Filippo Pacini correctly identified it as a bacterium and from him, the scientific name is adopted. The bacterium as the cause of cholera was discovered by Robert Koch in 1884. Sambhu Nath De isolated the cholera toxin and demonstrated the toxin as the cause of cholera in 1959. The bacterium has a flagellum at one pole and several pili throughout its cell surface. It undergoes respiratory and fermentative metabolism. Two serogroups called O1 and O139 are responsible for cholera outbreaks. Infection is mainly through drinking contaminated water, therefore is linked to sanitation and hygiene. When ingested, it invades the intestinal mucosa can cause diarrhea and vomiting in a host within several hours to 2–3 days of ingestion. Oral rehydration solution and antibiotics such as fluoroquinolones and tetracyclines are the common treatment methods. V. cholerae has two circular DNA. One DNA produces the cholera toxin (CT), a protein that causes profuse, watery diarrhea (known as "rice-water stool"). But the DNA does not directly code for the toxin as the genes for cholera toxin are carried by CTXphi (CTXφ), a temperate bacteriophage (virus). The virus when inserted into the bacterial DNA only produce the toxin.1
 
0.6%
Vibrio parahaemolyticus is a curved, rod-shaped, Gram-negative bacterium found in the sea and in estuaries which, when ingested, causes gastrointestinal illness in humans. V. parahaemolyticus is oxidase positive, facultatively aerobic, and does not form spores. Like other members of the genus Vibrio, this species is motile, with a single, polar flagellum.1
 
0.6%
Vibrio vulnificus is a species of Gram-negative, motile, curved rod-shaped (bacillus), pathogenic bacteria of the genus Vibrio. Present in marine environments such as estuaries, brackish ponds, or coastal areas, V. vulnificus is related to V. cholerae, the causative agent of cholera. At least one strain of V. vulnificus is bioluminescent.Infection with V. vulnificus leads to rapidly expanding cellulitis or sepsis.: 279  It was first isolated as a source of disease in 1976.1
 
0.6%
Wolbachia is a genus of intracellular bacteria that infects mainly arthropod species, including a high proportion of insects, and also some nematodes. It is one of the most common parasitic microbes and is possibly the most common reproductive parasite in the biosphere. Its interactions with its hosts are often complex, and in some cases have evolved to be mutualistic rather than parasitic. Some host species cannot reproduce, or even survive, without Wolbachia colonisation. One study concluded that more than 16% of neotropical insect species carry bacteria of this genus, and as many as 25 to 70% of all insect species are estimated to be potential hosts.1
 
0.6%
Other values (169)169
94.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
the1227
 
5.3%
of900
 
3.9%
and712
 
3.1%
in626
 
2.7%
is624
 
2.7%
a573
 
2.5%
to438
 
1.9%
are249
 
1.1%
as241
 
1.0%
it227
 
1.0%
Other values (4437)17401
74.9%

Most occurring characters

ValueCountFrequency (%)
23029
14.7%
e14460
 
9.2%
a11244
 
7.2%
i11100
 
7.1%
t10073
 
6.4%
s9329
 
5.9%
o9195
 
5.9%
n8688
 
5.5%
r7554
 
4.8%
c6262
 
4.0%
Other values (112)46040
29.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter124794
79.5%
Space Separator23036
 
14.7%
Other Punctuation3470
 
2.2%
Uppercase Letter3206
 
2.0%
Decimal Number1094
 
0.7%
Dash Punctuation534
 
0.3%
Close Punctuation300
 
0.2%
Open Punctuation300
 
0.2%
Control153
 
0.1%
Math Symbol68
 
< 0.1%
Other values (3)19
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e14460
11.6%
a11244
 
9.0%
i11100
 
8.9%
t10073
 
8.1%
s9329
 
7.5%
o9195
 
7.4%
n8688
 
7.0%
r7554
 
6.1%
c6262
 
5.0%
l5430
 
4.4%
Other values (44)31459
25.2%
Uppercase Letter
ValueCountFrequency (%)
T320
 
10.0%
A316
 
9.9%
B290
 
9.0%
I279
 
8.7%
S255
 
8.0%
C214
 
6.7%
M169
 
5.3%
G152
 
4.7%
P139
 
4.3%
N138
 
4.3%
Other values (17)934
29.1%
Other Punctuation
ValueCountFrequency (%)
.1656
47.7%
,1535
44.2%
"88
 
2.5%
'53
 
1.5%
;47
 
1.4%
:38
 
1.1%
%33
 
1.0%
/18
 
0.5%
?1
 
< 0.1%
§1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1237
21.7%
0236
21.6%
2139
12.7%
989
 
8.1%
875
 
6.9%
573
 
6.7%
468
 
6.2%
367
 
6.1%
762
 
5.7%
648
 
4.4%
Math Symbol
ValueCountFrequency (%)
=60
88.2%
+2
 
2.9%
×2
 
2.9%
<1
 
1.5%
>1
 
1.5%
~1
 
1.5%
1
 
1.5%
Space Separator
ValueCountFrequency (%)
23029
> 99.9%
5
 
< 0.1%
2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-475
89.0%
53
 
9.9%
6
 
1.1%
Close Punctuation
ValueCountFrequency (%)
)298
99.3%
]2
 
0.7%
Open Punctuation
ValueCountFrequency (%)
(298
99.3%
[2
 
0.7%
Control
ValueCountFrequency (%)
153
100.0%
Other Symbol
ValueCountFrequency (%)
°17
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin127923
81.5%
Common28980
 
18.5%
Greek71
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e14460
11.3%
a11244
 
8.8%
i11100
 
8.7%
t10073
 
7.9%
s9329
 
7.3%
o9195
 
7.2%
n8688
 
6.8%
r7554
 
5.9%
c6262
 
4.9%
l5430
 
4.2%
Other values (51)34588
27.0%
Common
ValueCountFrequency (%)
23029
79.5%
.1656
 
5.7%
,1535
 
5.3%
-475
 
1.6%
)298
 
1.0%
(298
 
1.0%
1237
 
0.8%
0236
 
0.8%
153
 
0.5%
2139
 
0.5%
Other values (32)924
 
3.2%
Greek
ValueCountFrequency (%)
μ16
22.5%
κ8
11.3%
φ6
 
8.5%
β6
 
8.5%
τ4
 
5.6%
ς4
 
5.6%
ό4
 
5.6%
λ4
 
5.6%
ρ3
 
4.2%
σ3
 
4.2%
Other values (9)13
18.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII156791
99.9%
None115
 
0.1%
Punctuation67
 
< 0.1%
Math Operators1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
23029
14.7%
e14460
 
9.2%
a11244
 
7.2%
i11100
 
7.1%
t10073
 
6.4%
s9329
 
5.9%
o9195
 
5.9%
n8688
 
5.5%
r7554
 
4.8%
c6262
 
4.0%
Other values (74)45857
29.2%
Punctuation
ValueCountFrequency (%)
53
79.1%
6
 
9.0%
5
 
7.5%
2
 
3.0%
1
 
1.5%
None
ValueCountFrequency (%)
°17
14.8%
μ16
 
13.9%
κ8
 
7.0%
φ6
 
5.2%
µ6
 
5.2%
β6
 
5.2%
τ4
 
3.5%
ς4
 
3.5%
ό4
 
3.5%
λ4
 
3.5%
Other values (22)40
34.8%
Math Operators
ValueCountFrequency (%)
1
100.0%

Type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size19.1 KiB
Bacteria
126 
Bacteriophage
53 

Length

Max length13
Median length8
Mean length9.480446927
Min length8

Characters and Unicode

Total characters1697
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBacteria
2nd rowBacteria
3rd rowBacteria
4th rowBacteria
5th rowBacteria

Common Values

ValueCountFrequency (%)
Bacteria126
70.4%
Bacteriophage53
29.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
bacteria126
70.4%
bacteriophage53
29.6%

Most occurring characters

ValueCountFrequency (%)
a358
21.1%
e232
13.7%
B179
10.5%
c179
10.5%
t179
10.5%
r179
10.5%
i179
10.5%
o53
 
3.1%
p53
 
3.1%
h53
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1518
89.5%
Uppercase Letter179
 
10.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a358
23.6%
e232
15.3%
c179
11.8%
t179
11.8%
r179
11.8%
i179
11.8%
o53
 
3.5%
p53
 
3.5%
h53
 
3.5%
g53
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
B179
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1697
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a358
21.1%
e232
13.7%
B179
10.5%
c179
10.5%
t179
10.5%
r179
10.5%
i179
10.5%
o53
 
3.1%
p53
 
3.1%
h53
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1697
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a358
21.1%
e232
13.7%
B179
10.5%
c179
10.5%
t179
10.5%
r179
10.5%
i179
10.5%
o53
 
3.1%
p53
 
3.1%
h53
 
3.1%

lang
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size10.4 KiB
en
179 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters358
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en179
100.0%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
en179
100.0%

Most occurring characters

ValueCountFrequency (%)
e179
50.0%
n179
50.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter358
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e179
50.0%
n179
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin358
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e179
50.0%
n179
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII358
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e179
50.0%
n179
50.0%

Description_clean
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct179
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size144.9 KiB
acinetobacter baumannii typically short almost round rodshaped coccobacillus gramnegative bacterium named bacteriologist paul baumann opportunistic pathogen human affecting people compromised immune system becoming increasingly important hospitalderived nosocomial infection specie genus acinetobacter often found soil sample leading common misconception baumannii soil organism almost exclusively isolated hospital environment although occasionally found environmental soil water sample natural habitat still known bacteria genus lack flagellum whiplike structure many bacteria use locomotion exhibit twitching swarming motility may due activity type iv pili polelike structure extended retracted motility baumannii may also due excretion exopolysaccharide creating film highmolecularweight sugar chain behind bacterium move forward clinical microbiologist typically differentiate member genus acinetobacter moraxellaceae performing oxidase test acinetobacter spp member moraxellaceae lack cytochrome c oxidasesa baumannii part acb complex baumannii calcoaceticus acinetobacter genomic specie 13tu difficult determine specific specie member acb complex comprise clinically relevant member genus baumannii also identified eskape pathogen enterococcus faecium staphylococcus aureus klebsiella pneumoniae acinetobacter baumannii pseudomonas aeruginosa enterobacter specie group pathogen high rate antibiotic resistance responsible majority nosocomial infectionscolloquially baumannii referred iraqibacter due seemingly sudden emergence military treatment facility iraq war continued issue veteran soldier served iraq afghanistan multidrugresistant baumannii spread civilian hospital part due transport infected soldier multiple medical facility covid19 pandemic coinfection baumannii secondary sarscov2 infection reported multiple time literature
 
1
bacteriophage φcb5 bacteriophage infects caulobacter bacteria caulobacteria bacteriophage discovered 1970 belongs genus cebevirus steitzviridae family type specie family bacteriophage widely distributed soil freshwater lake stream seawater place caulobacteria inhabit sensitive salinity
 
1
streptococcus sobrinus grampositive catalasenegative nonmotile anaerobic member genus streptococcus
 
1
treponema genus spiralshaped bacteria major treponeme specie human pathogen treponema pallidum whose subspecies responsible disease syphilis bejel yaw treponema carateum cause pinta treponema paraluiscuniculi associated syphilis rabbit treponema succinifaciens found gut microbiome traditional rural human population
 
1
ureaplasma urealyticum bacterium belonging genus ureaplasma family mycoplasmataceae order mycoplasmatales family consists genus mycoplasma ureaplasma type strain t960 two known biovars specie t960 27 strain bacterium commonly found urogenital tract human being overgrowth lead infection cause patient discomfort unlike bacteria ureaplasma urealyticum lack cell wall making unique physiology medical treatment
 
1
Other values (174)
174 

Length

Max length2645
Median length575
Mean length652.3743017
Min length53

Characters and Unicode

Total characters116775
Distinct characters66
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique179 ?
Unique (%)100.0%

Sample

1st rowacinetobacter baumannii typically short almost round rodshaped coccobacillus gramnegative bacterium named bacteriologist paul baumann opportunistic pathogen human affecting people compromised immune system becoming increasingly important hospitalderived nosocomial infection specie genus acinetobacter often found soil sample leading common misconception baumannii soil organism almost exclusively isolated hospital environment although occasionally found environmental soil water sample natural habitat still known bacteria genus lack flagellum whiplike structure many bacteria use locomotion exhibit twitching swarming motility may due activity type iv pili polelike structure extended retracted motility baumannii may also due excretion exopolysaccharide creating film highmolecularweight sugar chain behind bacterium move forward clinical microbiologist typically differentiate member genus acinetobacter moraxellaceae performing oxidase test acinetobacter spp member moraxellaceae lack cytochrome c oxidasesa baumannii part acb complex baumannii calcoaceticus acinetobacter genomic specie 13tu difficult determine specific specie member acb complex comprise clinically relevant member genus baumannii also identified eskape pathogen enterococcus faecium staphylococcus aureus klebsiella pneumoniae acinetobacter baumannii pseudomonas aeruginosa enterobacter specie group pathogen high rate antibiotic resistance responsible majority nosocomial infectionscolloquially baumannii referred iraqibacter due seemingly sudden emergence military treatment facility iraq war continued issue veteran soldier served iraq afghanistan multidrugresistant baumannii spread civilian hospital part due transport infected soldier multiple medical facility covid19 pandemic coinfection baumannii secondary sarscov2 infection reported multiple time literature
2nd rowactinomyces israelii specie grampositive rodshaped bacteria within genus actinomyces known live commensally within human israelii opportunistic pathogen cause actinomycosis many physiologically diverse strain specie known exist though strict anaerobe named german surgeon james adolf israel 18481926 studied organism first time 1878
3rd rowagrobacterium radiobacter commonly known agrobacterium tumefaciens causal agent crown gall disease formation tumour 140 specie eudicots rodshaped gramnegative soil bacterium symptom caused insertion small segment dna known tdna transfer dna confused trna transfer amino acid protein synthesis plasmid plant cell incorporated semirandom location plant genome plant genome engineered use agrobacterium delivery sequence hosted tdna binary vector agrobacterium tumefaciens alphaproteobacterium family rhizobiaceae includes nitrogenfixing legume symbionts unlike nitrogenfixing symbionts tumorproducing agrobacterium specie pathogenic benefit plant wide variety plant affected agrobacterium make great concern agriculture industryeconomically tumefaciens serious pathogen walnut grape vine stone fruit nut tree sugar beet horse radish rhubarb persistent nature tumor gall caused disease make particularly harmful perennial cropsagrobacterium tumefaciens grows optimally 28 c doubling time range 254h depending medium culture format level aeration temperature 30 c tumefaciens begin experience heat shock likely result error cell division
4th rowanaplasma genus bacteria alphaproteobacterial order rickettsiales family anaplasmataceae anaplasma specie reside host blood cell lead disease anaplasmosis disease commonly occurs area competent tick vector indigenous including tropical semitropical area world intraerythrocytic anaplasma sppanaplasma specie biologically transmitted ixodes deertick vector prototypical specie marginale mechanically transmitted biting fly iatrogenically bloodcontaminated instrument one major consequence infection bovine red blood cell marginale development nonhaemolytic anaemia thus absence hemoglobinuria allows clinical differentiation another major tickborne disease bovine babesiosis caused babesia bigeminaspecies veterinary interest include anaplasma marginale anaplasma centrale cattle anaplasma ovis anaplasma mesaeterum sheep goat anaplasma phagocytophilum dog cat horse see human granulocytic anaplasmosis anaplasma platy dog
5th rowanaplasma phagocytophilum formerly ehrlichia phagocytophilum gramnegative bacterium unusual tropism neutrophil cause anaplasmosis sheep cattle also known tickborne fever pasture fever also cause zoonotic disease human granulocytic anaplasmosisa phagocytophilum gramnegative obligate bacterium neutrophil cause human granulocytic anaplasmosis tickborne rickettsial disease bacterium invades neutrophil unique adaptation pathogenetic mechanism

Common Values

ValueCountFrequency (%)
acinetobacter baumannii typically short almost round rodshaped coccobacillus gramnegative bacterium named bacteriologist paul baumann opportunistic pathogen human affecting people compromised immune system becoming increasingly important hospitalderived nosocomial infection specie genus acinetobacter often found soil sample leading common misconception baumannii soil organism almost exclusively isolated hospital environment although occasionally found environmental soil water sample natural habitat still known bacteria genus lack flagellum whiplike structure many bacteria use locomotion exhibit twitching swarming motility may due activity type iv pili polelike structure extended retracted motility baumannii may also due excretion exopolysaccharide creating film highmolecularweight sugar chain behind bacterium move forward clinical microbiologist typically differentiate member genus acinetobacter moraxellaceae performing oxidase test acinetobacter spp member moraxellaceae lack cytochrome c oxidasesa baumannii part acb complex baumannii calcoaceticus acinetobacter genomic specie 13tu difficult determine specific specie member acb complex comprise clinically relevant member genus baumannii also identified eskape pathogen enterococcus faecium staphylococcus aureus klebsiella pneumoniae acinetobacter baumannii pseudomonas aeruginosa enterobacter specie group pathogen high rate antibiotic resistance responsible majority nosocomial infectionscolloquially baumannii referred iraqibacter due seemingly sudden emergence military treatment facility iraq war continued issue veteran soldier served iraq afghanistan multidrugresistant baumannii spread civilian hospital part due transport infected soldier multiple medical facility covid19 pandemic coinfection baumannii secondary sarscov2 infection reported multiple time literature1
 
0.6%
bacteriophage φcb5 bacteriophage infects caulobacter bacteria caulobacteria bacteriophage discovered 1970 belongs genus cebevirus steitzviridae family type specie family bacteriophage widely distributed soil freshwater lake stream seawater place caulobacteria inhabit sensitive salinity1
 
0.6%
streptococcus sobrinus grampositive catalasenegative nonmotile anaerobic member genus streptococcus1
 
0.6%
treponema genus spiralshaped bacteria major treponeme specie human pathogen treponema pallidum whose subspecies responsible disease syphilis bejel yaw treponema carateum cause pinta treponema paraluiscuniculi associated syphilis rabbit treponema succinifaciens found gut microbiome traditional rural human population1
 
0.6%
ureaplasma urealyticum bacterium belonging genus ureaplasma family mycoplasmataceae order mycoplasmatales family consists genus mycoplasma ureaplasma type strain t960 two known biovars specie t960 27 strain bacterium commonly found urogenital tract human being overgrowth lead infection cause patient discomfort unlike bacteria ureaplasma urealyticum lack cell wall making unique physiology medical treatment1
 
0.6%
vibrio genus gramnegative bacteria possessing curvedrod comma shape several specie cause foodborne infection usually associated eating undercooked seafood typically found salt water vibrio specie facultative anaerobe test positive oxidase form spore member genus motile able polar lateral flagellum without sheath vibrio specie typically posse two chromosome unusual bacteria chromosome distinct independent origin replication conserved together time genus recent phylogeny constructed based suite gene multilocus sequence analysiso f müller 1773 1786 described eight specie genus vibrio included infusoria three spirilliforms specie today assigned eukaryote taxon eg euglenoid peranema diatom bacillaria however vibrio müller 1773 became regarded name zoological genus name bacterial genus became vibrio pacini 1854 filippo pacini isolated microorganism called vibrion cholera patient 1854 motility latin vibrio mean quivervibrio spp commonly found marine environment marine vibrio specie highly salt tolerant grow wide range salinity si paul et al 2021 isolated characterized identified multiple strain vibrio specie vibrio alginolyticus vibrio natriegens vibrio pelagius vibrio azureus marine sponge saint martin island area bay bengal bangladesh vibrio specie found dominant bacteria marine environment1
 
0.6%
vibrio cholerae specie gramnegative facultative anaerobe commashaped bacteria bacteria naturally live brackish saltwater attach easily chitincontaining shell crab shrimp shellfish strain v cholerae pathogenic human cause deadly disease cholera derived consumption undercooked raw marine life speciesv cholerae first described félixarchimède pouchet 1849 kind protozoa filippo pacini correctly identified bacterium scientific name adopted bacterium cause cholera discovered robert koch 1884 sambhu nath de isolated cholera toxin demonstrated toxin cause cholera 1959 bacterium flagellum one pole several pili throughout cell surface undergoes respiratory fermentative metabolism two serogroups called o1 o139 responsible cholera outbreak infection mainly drinking contaminated water therefore linked sanitation hygiene ingested invades intestinal mucosa cause diarrhea vomiting host within several hour 23 day ingestion oral rehydration solution antibiotic fluoroquinolones tetracycline common treatment method v cholerae two circular dna one dna produce cholera toxin ct protein cause profuse watery diarrhea known ricewater stool dna directly code toxin gene cholera toxin carried ctxphi ctxφ temperate bacteriophage virus virus inserted bacterial dna produce toxin1
 
0.6%
vibrio parahaemolyticus curved rodshaped gramnegative bacterium found sea estuary ingested cause gastrointestinal illness human v parahaemolyticus oxidase positive facultatively aerobic form spore like member genus vibrio specie motile single polar flagellum1
 
0.6%
vibrio vulnificus specie gramnegative motile curved rodshaped bacillus pathogenic bacteria genus vibrio present marine environment estuary brackish pond coastal area v vulnificus related v cholerae causative agent cholera least one strain v vulnificus bioluminescentinfection v vulnificus lead rapidly expanding cellulitis sepsis 279 first isolated source disease 19761
 
0.6%
wolbachia genus intracellular bacteria infects mainly arthropod specie including high proportion insect also nematode one common parasitic microbe possibly common reproductive parasite biosphere interaction host often complex case evolved mutualistic rather parasitic host specie cannot reproduce even survive without wolbachia colonisation one study concluded 16 neotropical insect specie carry bacteria genus many 25 70 insect specie estimated potential host1
 
0.6%
Other values (169)169
94.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
specie195
 
1.4%
bacteria147
 
1.0%
genus135
 
1.0%
human134
 
0.9%
cause125
 
0.9%
bacterium118
 
0.8%
infection117
 
0.8%
cell110
 
0.8%
phage100
 
0.7%
also96
 
0.7%
Other values (4023)12870
91.0%

Most occurring characters

ValueCountFrequency (%)
13968
12.0%
e11940
 
10.2%
i9184
 
7.9%
a8898
 
7.6%
t7136
 
6.1%
o6932
 
5.9%
n6787
 
5.8%
r6676
 
5.7%
s6243
 
5.3%
c6129
 
5.2%
Other values (56)32882
28.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter101712
87.1%
Space Separator13968
 
12.0%
Decimal Number1094
 
0.9%
Connector Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e11940
11.7%
i9184
 
9.0%
a8898
 
8.7%
t7136
 
7.0%
o6932
 
6.8%
n6787
 
6.7%
r6676
 
6.6%
s6243
 
6.1%
c6129
 
6.0%
l5422
 
5.3%
Other values (44)26365
25.9%
Decimal Number
ValueCountFrequency (%)
1237
21.7%
0236
21.6%
2139
12.7%
989
 
8.1%
875
 
6.9%
573
 
6.7%
468
 
6.2%
367
 
6.1%
762
 
5.7%
648
 
4.4%
Space Separator
ValueCountFrequency (%)
13968
100.0%
Connector Punctuation
ValueCountFrequency (%)
_1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin101635
87.0%
Common15069
 
12.9%
Greek71
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e11940
11.7%
i9184
 
9.0%
a8898
 
8.8%
t7136
 
7.0%
o6932
 
6.8%
n6787
 
6.7%
r6676
 
6.6%
s6243
 
6.1%
c6129
 
6.0%
l5422
 
5.3%
Other values (25)26288
25.9%
Greek
ValueCountFrequency (%)
μ16
22.5%
κ8
11.3%
φ7
9.9%
β6
 
8.5%
τ4
 
5.6%
λ4
 
5.6%
ό4
 
5.6%
ς4
 
5.6%
σ3
 
4.2%
ρ3
 
4.2%
Other values (8)12
16.9%
Common
ValueCountFrequency (%)
13968
92.7%
1237
 
1.6%
0236
 
1.6%
2139
 
0.9%
989
 
0.6%
875
 
0.5%
573
 
0.5%
468
 
0.5%
367
 
0.4%
762
 
0.4%
Other values (3)55
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII116680
99.9%
None95
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13968
12.0%
e11940
 
10.2%
i9184
 
7.9%
a8898
 
7.6%
t7136
 
6.1%
o6932
 
5.9%
n6787
 
5.8%
r6676
 
5.7%
s6243
 
5.4%
c6129
 
5.3%
Other values (28)32787
28.1%
None
ValueCountFrequency (%)
μ16
16.8%
κ8
 
8.4%
φ7
 
7.4%
µ6
 
6.3%
β6
 
6.3%
τ4
 
4.2%
ó4
 
4.2%
λ4
 
4.2%
ό4
 
4.2%
ς4
 
4.2%
Other values (18)32
33.7%

word_count
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct133
Distinct (%)74.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean129.6536313
Minimum9
Maximum530
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum9
5-th percentile29.9
Q155
median102
Q3177
95-th percentile321.6
Maximum530
Range521
Interquartile range (IQR)122

Descriptive statistics

Standard deviation98.10160192
Coefficient of variation (CV)0.7566436894
Kurtosis1.615886312
Mean129.6536313
Median Absolute Deviation (MAD)54
Skewness1.299885631
Sum23208
Variance9623.924299
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
554
 
2.2%
723
 
1.7%
473
 
1.7%
783
 
1.7%
413
 
1.7%
1703
 
1.7%
1063
 
1.7%
593
 
1.7%
463
 
1.7%
863
 
1.7%
Other values (123)148
82.7%
ValueCountFrequency (%)
91
0.6%
101
0.6%
111
0.6%
152
1.1%
191
0.6%
201
0.6%
292
1.1%
301
0.6%
312
1.1%
321
0.6%
ValueCountFrequency (%)
5301
0.6%
4501
0.6%
4251
0.6%
4001
0.6%
3581
0.6%
3541
0.6%
3511
0.6%
3361
0.6%
3271
0.6%
3211
0.6%

char_count
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct168
Distinct (%)93.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean748.2960894
Minimum64
Maximum3079
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum64
5-th percentile174.7
Q1318
median575
Q31042
95-th percentile1861.5
Maximum3079
Range3015
Interquartile range (IQR)724

Descriptive statistics

Standard deviation561.5916251
Coefficient of variation (CV)0.7504938661
Kurtosis1.577051016
Mean748.2960894
Median Absolute Deviation (MAD)295
Skewness1.294963601
Sum133945
Variance315385.1534
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3732
 
1.1%
2982
 
1.1%
13162
 
1.1%
2142
 
1.1%
2292
 
1.1%
3312
 
1.1%
4112
 
1.1%
3102
 
1.1%
3182
 
1.1%
10462
 
1.1%
Other values (158)159
88.8%
ValueCountFrequency (%)
641
0.6%
771
0.6%
1001
0.6%
1011
0.6%
1091
0.6%
1251
0.6%
1311
0.6%
1711
0.6%
1721
0.6%
1751
0.6%
ValueCountFrequency (%)
30791
0.6%
25521
0.6%
23501
0.6%
22811
0.6%
21181
0.6%
19751
0.6%
19061
0.6%
18761
0.6%
18751
0.6%
18601
0.6%

sentence_count
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct26
Distinct (%)14.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.25139665
Minimum2
Maximum46
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum2
5-th percentile3
Q15
median8
Q314
95-th percentile22.2
Maximum46
Range44
Interquartile range (IQR)9

Descriptive statistics

Standard deviation7.371249628
Coefficient of variation (CV)0.7190483288
Kurtosis3.578145919
Mean10.25139665
Median Absolute Deviation (MAD)4
Skewness1.627951936
Sum1835
Variance54.33532107
MonotonicityNot monotonic
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
419
 
10.6%
618
 
10.1%
716
 
8.9%
514
 
7.8%
314
 
7.8%
912
 
6.7%
1011
 
6.1%
128
 
4.5%
87
 
3.9%
186
 
3.4%
Other values (16)54
30.2%
ValueCountFrequency (%)
26
 
3.4%
314
7.8%
419
10.6%
514
7.8%
618
10.1%
716
8.9%
87
 
3.9%
912
6.7%
1011
6.1%
115
 
2.8%
ValueCountFrequency (%)
461
 
0.6%
353
1.7%
271
 
0.6%
261
 
0.6%
252
 
1.1%
241
 
0.6%
223
1.7%
214
2.2%
206
3.4%
195
2.8%

avg_word_length
Real number (ℝ≥0)

Distinct175
Distinct (%)97.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.867284924
Minimum4.727272727
Maximum10.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum4.727272727
5-th percentile5.109244662
Q15.499084249
median5.813559322
Q36.122863248
95-th percentile6.707590569
Maximum10.1
Range5.372727273
Interquartile range (IQR)0.6237789988

Descriptive statistics

Standard deviation0.5755453755
Coefficient of variation (CV)0.09809398775
Kurtosis15.52984099
Mean5.867284924
Median Absolute Deviation (MAD)0.3153908239
Skewness2.439832391
Sum1050.244001
Variance0.3312524792
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
73
 
1.7%
5.62
 
1.1%
6.3265306122
 
1.1%
5.9830508471
 
0.6%
5.8751
 
0.6%
5.9661016951
 
0.6%
5.4883720931
 
0.6%
5.4981684981
 
0.6%
5.4869888481
 
0.6%
6.0196078431
 
0.6%
Other values (165)165
92.2%
ValueCountFrequency (%)
4.7272727271
0.6%
4.8076923081
0.6%
4.8723404261
0.6%
4.9347826091
0.6%
4.93751
0.6%
5.0486111111
0.6%
5.051
0.6%
5.1030927841
0.6%
5.1061452511
0.6%
5.1095890411
0.6%
ValueCountFrequency (%)
10.11
 
0.6%
7.2666666671
 
0.6%
7.1111111111
 
0.6%
7.0909090911
 
0.6%
7.0354609931
 
0.6%
73
1.7%
6.7567567571
 
0.6%
6.702127661
 
0.6%
6.6849315071
 
0.6%
6.6666666671
 
0.6%

avg_sentence_length
Real number (ℝ≥0)

HIGH CORRELATION

Distinct152
Distinct (%)84.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.64943597
Minimum4.5
Maximum26.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum4.5
5-th percentile6.666666667
Q19.527777778
median11.875
Q315.66666667
95-th percentile19.92
Maximum26.6
Range22.1
Interquartile range (IQR)6.138888889

Descriptive statistics

Standard deviation4.391931159
Coefficient of variation (CV)0.3472037147
Kurtosis0.205744776
Mean12.64943597
Median Absolute Deviation (MAD)3.041666667
Skewness0.6626630912
Sum2264.249039
Variance19.2890593
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
174
 
2.2%
7.53
 
1.7%
13.666666673
 
1.7%
113
 
1.7%
12.666666672
 
1.1%
8.1666666672
 
1.1%
9.8333333332
 
1.1%
182
 
1.1%
82
 
1.1%
7.42
 
1.1%
Other values (142)154
86.0%
ValueCountFrequency (%)
4.51
0.6%
51
0.6%
5.51
0.6%
5.81
0.6%
5.8461538461
0.6%
6.1251
0.6%
6.3333333331
0.6%
6.61
0.6%
6.6666666672
1.1%
6.8751
0.6%
ValueCountFrequency (%)
26.61
0.6%
25.51
0.6%
24.51
0.6%
23.866666671
0.6%
23.41
0.6%
22.285714291
0.6%
22.1251
0.6%
22.083333331
0.6%
20.41
0.6%
19.866666671
0.6%

name_word_count
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
2
123 
1
43 
3
13 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2123
68.7%
143
 
24.0%
313
 
7.3%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
2123
68.7%
143
 
24.0%
313
 
7.3%

Most occurring characters

ValueCountFrequency (%)
2123
68.7%
143
 
24.0%
313
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2123
68.7%
143
 
24.0%
313
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2123
68.7%
143
 
24.0%
313
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2123
68.7%
143
 
24.0%
313
 
7.3%

name_char_count
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct24
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.07821229
Minimum6
Maximum30
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum6
5-th percentile8
Q113
median18
Q321
95-th percentile24.1
Maximum30
Range24
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.108323802
Coefficient of variation (CV)0.2991134971
Kurtosis-0.6184960472
Mean17.07821229
Median Absolute Deviation (MAD)3
Skewness-0.2402103769
Sum3057
Variance26.09497207
MonotonicityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1822
 
12.3%
2018
 
10.1%
2115
 
8.4%
1915
 
8.4%
1712
 
6.7%
1010
 
5.6%
129
 
5.0%
239
 
5.0%
227
 
3.9%
87
 
3.9%
Other values (14)55
30.7%
ValueCountFrequency (%)
61
 
0.6%
73
 
1.7%
87
3.9%
97
3.9%
1010
5.6%
116
3.4%
129
5.0%
136
3.4%
145
2.8%
156
3.4%
ValueCountFrequency (%)
301
 
0.6%
291
 
0.6%
271
 
0.6%
262
 
1.1%
254
 
2.2%
246
 
3.4%
239
5.0%
227
 
3.9%
2115
8.4%
2018
10.1%

name_avg_word_length
Real number (ℝ≥0)

HIGH CORRELATION

Distinct29
Distinct (%)16.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.568901304
Minimum3.5
Maximum19
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum3.5
5-th percentile6
Q18.5
median9.5
Q311
95-th percentile13
Maximum19
Range15.5
Interquartile range (IQR)2.5

Descriptive statistics

Standard deviation2.255434823
Coefficient of variation (CV)0.2357046803
Kurtosis2.158351327
Mean9.568901304
Median Absolute Deviation (MAD)1.5
Skewness0.423882704
Sum1712.833333
Variance5.086986239
MonotonicityNot monotonic
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
927
15.1%
1025
14.0%
10.513
 
7.3%
9.512
 
6.7%
812
 
6.7%
1111
 
6.1%
1211
 
6.1%
8.511
 
6.1%
78
 
4.5%
138
 
4.5%
Other values (19)41
22.9%
ValueCountFrequency (%)
3.51
 
0.6%
3.6666666671
 
0.6%
41
 
0.6%
52
 
1.1%
5.51
 
0.6%
5.6666666671
 
0.6%
66
3.4%
6.3333333332
 
1.1%
6.6666666672
 
1.1%
78
4.5%
ValueCountFrequency (%)
191
 
0.6%
181
 
0.6%
151
 
0.6%
141
 
0.6%
13.51
 
0.6%
138
4.5%
12.54
 
2.2%
1211
6.1%
11.57
3.9%
1111
6.1%

Polarity
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct156
Distinct (%)87.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06782696168
Minimum-0.2825
Maximum0.55
Zeros17
Zeros (%)9.5%
Negative42
Negative (%)23.5%
Memory size1.5 KiB

Quantile statistics

Minimum-0.2825
5-th percentile-0.1023212121
Q10
median0.05583333333
Q30.115
95-th percentile0.3116666667
Maximum0.55
Range0.8325
Interquartile range (IQR)0.115

Descriptive statistics

Standard deviation0.1275124711
Coefficient of variation (CV)1.879967316
Kurtosis2.931885608
Mean0.06782696168
Median Absolute Deviation (MAD)0.05587454212
Skewness1.008554241
Sum12.14102614
Variance0.01625943028
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
017
 
9.5%
0.42
 
1.1%
0.1152
 
1.1%
-0.06252
 
1.1%
0.052
 
1.1%
0.068752
 
1.1%
-0.041666666672
 
1.1%
-0.13333333332
 
1.1%
-0.019569892471
 
0.6%
0.058706293711
 
0.6%
Other values (146)146
81.6%
ValueCountFrequency (%)
-0.28251
0.6%
-0.27767857141
0.6%
-0.21251
0.6%
-0.16251
0.6%
-0.14166666671
0.6%
-0.13751
0.6%
-0.13333333332
1.1%
-0.11666666671
0.6%
-0.10072727271
0.6%
-0.091666666671
0.6%
ValueCountFrequency (%)
0.551
0.6%
0.5251
0.6%
0.51
0.6%
0.46666666671
0.6%
0.42
1.1%
0.38751
0.6%
0.3751
0.6%
0.33333333331
0.6%
0.30925925931
0.6%
0.28571428571
0.6%

parsed
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size30.9 KiB

entity_tags
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size19.9 KiB

entity_types
Unsupported

REJECTED
UNSUPPORTED

Missing0
Missing (%)0.0%
Memory size43.5 KiB

CARDINAL
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.273743017
Minimum0
Maximum14
Zeros92
Zeros (%)51.4%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile4.1
Maximum14
Range14
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.00222712
Coefficient of variation (CV)1.571923923
Kurtosis10.92274489
Mean1.273743017
Median Absolute Deviation (MAD)0
Skewness2.768483866
Sum228
Variance4.008913439
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
092
51.4%
230
 
16.8%
128
 
15.6%
311
 
6.1%
49
 
5.0%
73
 
1.7%
82
 
1.1%
51
 
0.6%
91
 
0.6%
141
 
0.6%
ValueCountFrequency (%)
092
51.4%
128
 
15.6%
230
 
16.8%
311
 
6.1%
49
 
5.0%
51
 
0.6%
61
 
0.6%
73
 
1.7%
82
 
1.1%
91
 
0.6%
ValueCountFrequency (%)
141
 
0.6%
91
 
0.6%
82
 
1.1%
73
 
1.7%
61
 
0.6%
51
 
0.6%
49
 
5.0%
311
 
6.1%
230
16.8%
128
15.6%

DATE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5865921788
Minimum0
Maximum8
Zeros121
Zeros (%)67.6%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.164570185
Coefficient of variation (CV)1.985314886
Kurtosis12.50625325
Mean0.5865921788
Median Absolute Deviation (MAD)0
Skewness3.100941333
Sum105
Variance1.356223715
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
0121
67.6%
136
 
20.1%
210
 
5.6%
36
 
3.4%
43
 
1.7%
51
 
0.6%
81
 
0.6%
61
 
0.6%
ValueCountFrequency (%)
0121
67.6%
136
 
20.1%
210
 
5.6%
36
 
3.4%
43
 
1.7%
51
 
0.6%
61
 
0.6%
81
 
0.6%
ValueCountFrequency (%)
81
 
0.6%
61
 
0.6%
51
 
0.6%
43
 
1.7%
36
 
3.4%
210
 
5.6%
136
 
20.1%
0121
67.6%

EVENT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
174 
1
 
5

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0174
97.2%
15
 
2.8%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0174
97.2%
15
 
2.8%

Most occurring characters

ValueCountFrequency (%)
0174
97.2%
15
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0174
97.2%
15
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0174
97.2%
15
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0174
97.2%
15
 
2.8%

FAC
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
172 
1
 
7

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0172
96.1%
17
 
3.9%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0172
96.1%
17
 
3.9%

Most occurring characters

ValueCountFrequency (%)
0172
96.1%
17
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0172
96.1%
17
 
3.9%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0172
96.1%
17
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0172
96.1%
17
 
3.9%

GPE
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4860335196
Minimum0
Maximum5
Zeros134
Zeros (%)74.9%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.5
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation1.045899118
Coefficient of variation (CV)2.151907381
Kurtosis7.241646686
Mean0.4860335196
Median Absolute Deviation (MAD)0
Skewness2.644464033
Sum87
Variance1.093904965
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0134
74.9%
123
 
12.8%
211
 
6.1%
36
 
3.4%
54
 
2.2%
41
 
0.6%
ValueCountFrequency (%)
0134
74.9%
123
 
12.8%
211
 
6.1%
36
 
3.4%
41
 
0.6%
54
 
2.2%
ValueCountFrequency (%)
54
 
2.2%
41
 
0.6%
36
 
3.4%
211
 
6.1%
123
 
12.8%
0134
74.9%

LANGUAGE
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
178 
1
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0178
99.4%
11
 
0.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0178
99.4%
11
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0178
99.4%
11
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0178
99.4%
11
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0178
99.4%
11
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0178
99.4%
11
 
0.6%

LAW
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
179 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0179
100.0%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0179
100.0%

Most occurring characters

ValueCountFrequency (%)
0179
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0179
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0179
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0179
100.0%

LOC
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
155 
1
17 
2
 
5
3
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0155
86.6%
117
 
9.5%
25
 
2.8%
32
 
1.1%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0155
86.6%
117
 
9.5%
25
 
2.8%
32
 
1.1%

Most occurring characters

ValueCountFrequency (%)
0155
86.6%
117
 
9.5%
25
 
2.8%
32
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0155
86.6%
117
 
9.5%
25
 
2.8%
32
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0155
86.6%
117
 
9.5%
25
 
2.8%
32
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0155
86.6%
117
 
9.5%
25
 
2.8%
32
 
1.1%

MONEY
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
179 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0179
100.0%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0179
100.0%

Most occurring characters

ValueCountFrequency (%)
0179
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0179
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0179
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0179
100.0%

NORP
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
147 
1
23 
2
 
4
4
 
3
3
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0147
82.1%
123
 
12.8%
24
 
2.2%
43
 
1.7%
32
 
1.1%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0147
82.1%
123
 
12.8%
24
 
2.2%
43
 
1.7%
32
 
1.1%

Most occurring characters

ValueCountFrequency (%)
0147
82.1%
123
 
12.8%
24
 
2.2%
43
 
1.7%
32
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0147
82.1%
123
 
12.8%
24
 
2.2%
43
 
1.7%
32
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0147
82.1%
123
 
12.8%
24
 
2.2%
43
 
1.7%
32
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0147
82.1%
123
 
12.8%
24
 
2.2%
43
 
1.7%
32
 
1.1%

ORDINAL
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
149 
1
21 
2
 
7
3
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0149
83.2%
121
 
11.7%
27
 
3.9%
32
 
1.1%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0149
83.2%
121
 
11.7%
27
 
3.9%
32
 
1.1%

Most occurring characters

ValueCountFrequency (%)
0149
83.2%
121
 
11.7%
27
 
3.9%
32
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0149
83.2%
121
 
11.7%
27
 
3.9%
32
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0149
83.2%
121
 
11.7%
27
 
3.9%
32
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0149
83.2%
121
 
11.7%
27
 
3.9%
32
 
1.1%

ORG
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.078212291
Minimum0
Maximum10
Zeros88
Zeros (%)49.2%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile4
Maximum10
Range10
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.508226557
Coefficient of variation (CV)1.398821522
Kurtosis6.842243999
Mean1.078212291
Median Absolute Deviation (MAD)1
Skewness2.121183256
Sum193
Variance2.274747348
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
088
49.2%
142
23.5%
223
 
12.8%
311
 
6.1%
49
 
5.0%
54
 
2.2%
61
 
0.6%
101
 
0.6%
ValueCountFrequency (%)
088
49.2%
142
23.5%
223
 
12.8%
311
 
6.1%
49
 
5.0%
54
 
2.2%
61
 
0.6%
101
 
0.6%
ValueCountFrequency (%)
101
 
0.6%
61
 
0.6%
54
 
2.2%
49
 
5.0%
311
 
6.1%
223
 
12.8%
142
23.5%
088
49.2%

PERCENT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
157 
1
 
13
2
 
8
3
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0157
87.7%
113
 
7.3%
28
 
4.5%
31
 
0.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0157
87.7%
113
 
7.3%
28
 
4.5%
31
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0157
87.7%
113
 
7.3%
28
 
4.5%
31
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0157
87.7%
113
 
7.3%
28
 
4.5%
31
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0157
87.7%
113
 
7.3%
28
 
4.5%
31
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0157
87.7%
113
 
7.3%
28
 
4.5%
31
 
0.6%

PERSON
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct10
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.379888268
Minimum0
Maximum9
Zeros76
Zeros (%)42.5%
Negative0
Negative (%)0.0%
Memory size1.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile5
Maximum9
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.796012434
Coefficient of variation (CV)1.301563666
Kurtosis3.785793805
Mean1.379888268
Median Absolute Deviation (MAD)1
Skewness1.827837447
Sum247
Variance3.225660662
MonotonicityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
076
42.5%
142
23.5%
227
 
15.1%
413
 
7.3%
311
 
6.1%
54
 
2.2%
83
 
1.7%
71
 
0.6%
91
 
0.6%
61
 
0.6%
ValueCountFrequency (%)
076
42.5%
142
23.5%
227
 
15.1%
311
 
6.1%
413
 
7.3%
54
 
2.2%
61
 
0.6%
71
 
0.6%
83
 
1.7%
91
 
0.6%
ValueCountFrequency (%)
91
 
0.6%
83
 
1.7%
71
 
0.6%
61
 
0.6%
54
 
2.2%
413
 
7.3%
311
 
6.1%
227
 
15.1%
142
23.5%
076
42.5%

PRODUCT
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
156 
1
17 
2
 
4
4
 
1
3
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)1.1%

Sample

1st row2
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0156
87.2%
117
 
9.5%
24
 
2.2%
41
 
0.6%
31
 
0.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0156
87.2%
117
 
9.5%
24
 
2.2%
41
 
0.6%
31
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0156
87.2%
117
 
9.5%
24
 
2.2%
41
 
0.6%
31
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0156
87.2%
117
 
9.5%
24
 
2.2%
41
 
0.6%
31
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0156
87.2%
117
 
9.5%
24
 
2.2%
41
 
0.6%
31
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0156
87.2%
117
 
9.5%
24
 
2.2%
41
 
0.6%
31
 
0.6%

QUANTITY
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
172 
1
 
6
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0172
96.1%
16
 
3.4%
21
 
0.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0172
96.1%
16
 
3.4%
21
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0172
96.1%
16
 
3.4%
21
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0172
96.1%
16
 
3.4%
21
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0172
96.1%
16
 
3.4%
21
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0172
96.1%
16
 
3.4%
21
 
0.6%

TIME
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
173 
1
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0173
96.6%
16
 
3.4%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0173
96.6%
16
 
3.4%

Most occurring characters

ValueCountFrequency (%)
0173
96.6%
16
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0173
96.6%
16
 
3.4%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0173
96.6%
16
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0173
96.6%
16
 
3.4%

WORK_OF_ART
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size10.3 KiB
0
175 
1
 
3
2
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters179
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0175
97.8%
13
 
1.7%
21
 
0.6%

Length

Histogram of lengths of the category

Category Frequency Plot

ValueCountFrequency (%)
0175
97.8%
13
 
1.7%
21
 
0.6%

Most occurring characters

ValueCountFrequency (%)
0175
97.8%
13
 
1.7%
21
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number179
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0175
97.8%
13
 
1.7%
21
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common179
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0175
97.8%
13
 
1.7%
21
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII179
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0175
97.8%
13
 
1.7%
21
 
0.6%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexNameDescriptionTypelangDescription_cleanword_countchar_countsentence_countavg_word_lengthavg_sentence_lengthname_word_countname_char_countname_avg_word_lengthPolarityparsedentity_tagsentity_typesCARDINALDATEEVENTFACGPELANGUAGELAWLOCMONEYNORPORDINALORGPERCENTPERSONPRODUCTQUANTITYTIMEWORK_OF_ART
00Acinetobacter baumanniiAcinetobacter baumannii is a typically short, almost round, rod-shaped (coccobacillus) Gram-negative bacterium. It is named after the bacteriologist Paul Baumann. It can be an opportunistic pathogen in humans, affecting people with compromised immune systems, and is becoming increasingly important as a hospital-derived (nosocomial) infection. While other species of the genus Acinetobacter are often found in soil samples (leading to the common misconception that A. baumannii is a soil organism, too), it is almost exclusively isolated from hospital environments. Although occasionally it has been found in environmental soil and water samples, its natural habitat is still not known.\nBacteria of this genus lack flagella, whip-like structures many bacteria use for locomotion, but exhibit twitching or swarming motility. This may be due to the activity of type IV pili, pole-like structures that can be extended and retracted. Motility in A. baumannii may also be due to the excretion of exopolysaccharide, creating a film of high-molecular-weight sugar chains behind the bacterium to move forward. Clinical microbiologists typically differentiate members of the genus Acinetobacter from other Moraxellaceae by performing an oxidase test, as Acinetobacter spp. are the only members of the Moraxellaceae to lack cytochrome c oxidases.A. baumannii is part of the ACB complex (A. baumannii, A. calcoaceticus, and Acinetobacter genomic species 13TU). It is difficult to determine the specific species of members of the ACB complex and they comprise the most clinically relevant members of the genus. A. baumannii has also been identified as an ESKAPE pathogen (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species), a group of pathogens with a high rate of antibiotic resistance that are responsible for the majority of nosocomial infections.Colloquially, A. baumannii is referred to as "Iraqibacter" due to its seemingly sudden emergence in military treatment facilities during the Iraq War. It has continued to be an issue for veterans and soldiers who served in Iraq and Afghanistan. Multidrug-resistant A. baumannii has spread to civilian hospitals in part due to the transport of infected soldiers through multiple medical facilities. During the COVID-19 pandemic, coinfection with A. baumannii secondary to SARS-CoV-2 infections has been reported multiple times in literature.Bacteriaenacinetobacter baumannii typically short almost round rodshaped coccobacillus gramnegative bacterium named bacteriologist paul baumann opportunistic pathogen human affecting people compromised immune system becoming increasingly important hospitalderived nosocomial infection specie genus acinetobacter often found soil sample leading common misconception baumannii soil organism almost exclusively isolated hospital environment although occasionally found environmental soil water sample natural habitat still known bacteria genus lack flagellum whiplike structure many bacteria use locomotion exhibit twitching swarming motility may due activity type iv pili polelike structure extended retracted motility baumannii may also due excretion exopolysaccharide creating film highmolecularweight sugar chain behind bacterium move forward clinical microbiologist typically differentiate member genus acinetobacter moraxellaceae performing oxidase test acinetobacter spp member moraxellaceae lack cytochrome c oxidasesa baumannii part acb complex baumannii calcoaceticus acinetobacter genomic specie 13tu difficult determine specific specie member acb complex comprise clinically relevant member genus baumannii also identified eskape pathogen enterococcus faecium staphylococcus aureus klebsiella pneumoniae acinetobacter baumannii pseudomonas aeruginosa enterobacter specie group pathogen high rate antibiotic resistance responsible majority nosocomial infectionscolloquially baumannii referred iraqibacter due seemingly sudden emergence military treatment facility iraq war continued issue veteran soldier served iraq afghanistan multidrugresistant baumannii spread civilian hospital part due transport infected soldier multiple medical facility covid19 pandemic coinfection baumannii secondary sarscov2 infection reported multiple time literature3542118275.98305113.11111122211.0-0.019570(acinetobacter, baumannii, typically, short, almost, round, rodshaped, coccobacillus, gramnegative, bacterium, named, bacteriologist, paul, baumann, opportunistic, pathogen, human, affecting, people, compromised, immune, system, becoming, increasingly, important, hospitalderived, nosocomial, infection, specie, genus, acinetobacter, often, found, soil, sample, leading, common, misconception, baumannii, soil, organism, almost, exclusively, isolated, hospital, environment, although, occasionally, found, environmental, soil, water, sample, natural, habitat, still, known, bacteria, genus, lack, flagellum, whiplike, structure, many, bacteria, use, locomotion, exhibit, twitching, swarming, motility, may, due, activity, type, iv, pili, polelike, structure, extended, retracted, motility, baumannii, may, also, due, excretion, exopolysaccharide, creating, film, highmolecularweight, sugar, chain, behind, bacterium, move, forward, clinical, microbiologist, typically, ...)[(Paul Baumann, PERSON), (Acinetobacter, LOC), (IV pili, PERSON), (Moraxellaceae, PERSON), (ACB, ORG), (Acinetobacter, PRODUCT), (13TU, CARDINAL), (ACB, ORG), (ESKAPE, ORG), (Acinetobacter, PRODUCT), (Enterobacter, PERSON), (the Iraq War, EVENT), (Iraq, GPE), (Afghanistan, GPE)][1, 0, 1, 0, 2, 0, 0, 1, 0, 0, 0, 3, 0, 4, 2, 0, 0, 0]101020010003042000
11Actinomyces israeliiActinomyces israelii is a species of Gram-positive, rod-shaped bacteria within the genus Actinomyces. Known to live commensally on and within humans, A. israelii is an opportunistic pathogen and a cause of actinomycosis. Many physiologically diverse strains of the species are known to exist, though not all are strict anaerobes. It was named after the German surgeon James Adolf Israel (1848–1926), who studied the organism for the first time in 1878.Bacteriaenactinomyces israelii specie grampositive rodshaped bacteria within genus actinomyces known live commensally within human israelii opportunistic pathogen cause actinomycosis many physiologically diverse strain specie known exist though strict anaerobe named german surgeon james adolf israel 18481926 studied organism first time 18787238365.31944412.0000002199.50.221591(actinomyces, israelii, specie, grampositive, rodshaped, bacteria, within, genus, actinomyces, known, live, commensally, within, human, israelii, opportunistic, pathogen, cause, actinomycosis, many, physiologically, diverse, strain, specie, known, exist, though, strict, anaerobe, named, german, surgeon, james, adolf, israel, 18481926, studied, organism, first, time, 1878)[(Actinomyces, ORG), (Actinomyces, ORG), (German, NORP), (James Adolf, PERSON), (Israel, GPE), (1848–1926, CARDINAL), (first, ORDINAL), (1878, DATE)][1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 2, 0, 1, 0, 0, 0, 0]110010000112010000
22Agrobacterium tumefaciensAgrobacterium radiobacter (more commonly known as Agrobacterium tumefaciens) is the causal agent of crown gall disease (the formation of tumours) in over 140 species of eudicots. It is a rod-shaped, Gram-negative soil bacterium. Symptoms are caused by the insertion of a small segment of DNA (known as the T-DNA, for 'transfer DNA', not to be confused with tRNA that transfers amino acids during protein synthesis), from a plasmid into the plant cell, which is incorporated at a semi-random location into the plant genome. Plant genomes can be engineered by use of Agrobacterium for the delivery of sequences hosted in T-DNA binary vectors.\nAgrobacterium tumefaciens is an alphaproteobacterium of the family Rhizobiaceae, which includes the nitrogen-fixing legume symbionts. Unlike the nitrogen-fixing symbionts, tumor-producing Agrobacterium species are pathogenic and do not benefit the plant. The wide variety of plants affected by Agrobacterium makes it of great concern to the agriculture industry.Economically, A. tumefaciens is a serious pathogen of walnuts, grape vines, stone fruits, nut trees, sugar beets, horse radish, and rhubarb, and the persistent nature of the tumors or galls caused by the disease make it particularly harmful for perennial crops.Agrobacterium tumefaciens grows optimally at 28 °C. The doubling time can range from 2.5–4h depending on the media, culture format, and level of aeration. At temperatures above 30 °C, A. tumefaciens begins to experience heat shock which is likely to result in errors in cell division.Bacteriaenagrobacterium radiobacter commonly known agrobacterium tumefaciens causal agent crown gall disease formation tumour 140 specie eudicots rodshaped gramnegative soil bacterium symptom caused insertion small segment dna known tdna transfer dna confused trna transfer amino acid protein synthesis plasmid plant cell incorporated semirandom location plant genome plant genome engineered use agrobacterium delivery sequence hosted tdna binary vector agrobacterium tumefaciens alphaproteobacterium family rhizobiaceae includes nitrogenfixing legume symbionts unlike nitrogenfixing symbionts tumorproducing agrobacterium specie pathogenic benefit plant wide variety plant affected agrobacterium make great concern agriculture industryeconomically tumefaciens serious pathogen walnut grape vine stone fruit nut tree sugar beet horse radish rhubarb persistent nature tumor gall caused disease make particularly harmful perennial cropsagrobacterium tumefaciens grows optimally 28 c doubling time range 254h depending medium culture format level aeration temperature 30 c tumefaciens begin experience heat shock likely result error cell division2351316155.60000015.66666722412.00.008333(agrobacterium, radiobacter, commonly, known, agrobacterium, tumefaciens, causal, agent, crown, gall, disease, formation, tumour, 140, specie, eudicots, rodshaped, gramnegative, soil, bacterium, symptom, caused, insertion, small, segment, dna, known, tdna, transfer, dna, confused, trna, transfer, amino, acid, protein, synthesis, plasmid, plant, cell, incorporated, semirandom, location, plant, genome, plant, genome, engineered, use, agrobacterium, delivery, sequence, hosted, tdna, binary, vector, agrobacterium, tumefaciens, alphaproteobacterium, family, rhizobiaceae, includes, nitrogenfixing, legume, symbionts, unlike, nitrogenfixing, symbionts, tumorproducing, agrobacterium, specie, pathogenic, benefit, plant, wide, variety, plant, affected, agrobacterium, make, great, concern, agriculture, industryeconomically, tumefaciens, serious, pathogen, walnut, grape, vine, stone, fruit, nut, tree, sugar, beet, horse, radish, rhubarb, persistent, ...)[(over 140, CARDINAL), (Rhizobiaceae, PERSON), (28 °C, CARDINAL), (2.5–4h, CARDINAL)][3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]300000000000010000
33AnaplasmaAnaplasma is a genus of bacteria of the alphaproteobacterial order Rickettsiales, family Anaplasmataceae.\nAnaplasma species reside in host blood cells and lead to the disease anaplasmosis. The disease most commonly occurs in areas where competent tick vectors are indigenous, including tropical and semitropical areas of the world for intraerythrocytic Anaplasma spp.Anaplasma species are biologically transmitted by Ixodes deer-tick vectors, and the prototypical species, A. marginale, can be mechanically transmitted by biting flies and iatrogenically with blood-contaminated instruments. One of the major consequences of infection by bovine red blood cells by A. marginale is the development of nonhaemolytic anaemia, thus the absence of hemoglobinuria, which allows clinical differentiation from another major tick-borne disease, bovine babesiosis, caused by Babesia bigemina.Species of veterinary interest include:\n\nAnaplasma marginale and Anaplasma centrale in cattle\nAnaplasma ovis and Anaplasma mesaeterum in sheep and goats\nAnaplasma phagocytophilum in dogs, cats, and horses (see human granulocytic anaplasmosis)\nAnaplasma platys in dogsBacteriaenanaplasma genus bacteria alphaproteobacterial order rickettsiales family anaplasmataceae anaplasma specie reside host blood cell lead disease anaplasmosis disease commonly occurs area competent tick vector indigenous including tropical semitropical area world intraerythrocytic anaplasma sppanaplasma specie biologically transmitted ixodes deertick vector prototypical specie marginale mechanically transmitted biting fly iatrogenically bloodcontaminated instrument one major consequence infection bovine red blood cell marginale development nonhaemolytic anaemia thus absence hemoglobinuria allows clinical differentiation another major tickborne disease bovine babesiosis caused babesia bigeminaspecies veterinary interest include anaplasma marginale anaplasma centrale cattle anaplasma ovis anaplasma mesaeterum sheep goat anaplasma phagocytophilum dog cat horse see human granulocytic anaplasmosis anaplasma platy dog148100086.75675718.500000199.00.101562(anaplasma, genus, bacteria, alphaproteobacterial, order, rickettsiales, family, anaplasmataceae, anaplasma, specie, reside, host, blood, cell, lead, disease, anaplasmosis, disease, commonly, occurs, area, competent, tick, vector, indigenous, including, tropical, semitropical, area, world, intraerythrocytic, anaplasma, sppanaplasma, specie, biologically, transmitted, ixodes, deertick, vector, prototypical, specie, marginale, mechanically, transmitted, biting, fly, iatrogenically, bloodcontaminated, instrument, one, major, consequence, infection, bovine, red, blood, cell, marginale, development, nonhaemolytic, anaemia, thus, absence, hemoglobinuria, allows, clinical, differentiation, another, major, tickborne, disease, bovine, babesiosis, caused, babesia, bigeminaspecies, veterinary, interest, include, anaplasma, marginale, anaplasma, centrale, cattle, anaplasma, ovis, anaplasma, mesaeterum, sheep, goat, anaplasma, phagocytophilum, dog, cat, horse, see, human, granulocytic, anaplasmosis, anaplasma, ...)[(Anaplasma, PERSON), (Anaplasmataceae, ORG), (Anaplasma, PERSON), (Anaplasma, PERSON), (Anaplasma, PERSON), (Ixodes, ORG), (One, CARDINAL), (Anaplasma, PERSON), (Anaplasma, GPE), (Anaplasma ovis, PERSON), (Anaplasma, PERSON), (Anaplasma, PERSON)][1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 8, 0, 0, 0, 0]100010000002080000
44Anaplasma phagocytophilumAnaplasma phagocytophilum (formerly Ehrlichia phagocytophilum) is a Gram-negative bacterium that is unusual in its tropism to neutrophils. It causes anaplasmosis in sheep and cattle, also known as tick-borne fever and pasture fever, and also causes the zoonotic disease human granulocytic anaplasmosis.A. phagocytophilum is a Gram-negative, obligate bacterium of neutrophils. It causes human granulocytic anaplasmosis, which is a tick-borne rickettsial disease. Because this bacterium invades neutrophils, it has a unique adaptation and pathogenetic mechanism.Bacteriaenanaplasma phagocytophilum formerly ehrlichia phagocytophilum gramnegative bacterium unusual tropism neutrophil cause anaplasmosis sheep cattle also known tickborne fever pasture fever also cause zoonotic disease human granulocytic anaplasmosisa phagocytophilum gramnegative obligate bacterium neutrophil cause human granulocytic anaplasmosis tickborne rickettsial disease bacterium invades neutrophil unique adaptation pathogenetic mechanism7348876.68493210.42857122412.00.115000(anaplasma, phagocytophilum, formerly, ehrlichia, phagocytophilum, gramnegative, bacterium, unusual, tropism, neutrophil, cause, anaplasmosis, sheep, cattle, also, known, tickborne, fever, pasture, fever, also, cause, zoonotic, disease, human, granulocytic, anaplasmosisa, phagocytophilum, gramnegative, obligate, bacterium, neutrophil, cause, human, granulocytic, anaplasmosis, tickborne, rickettsial, disease, bacterium, invades, neutrophil, unique, adaptation, pathogenetic, mechanism)[(Anaplasma, PERSON), (Ehrlichia, ORG), (A. phagocytophilum, PERSON)][0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0]000000000001020000
55Azorhizobium caulinodansAzorhizobium caulinodans is a species of bacteria that forms a nitrogen-fixing symbiosis with plants of the genus Sesbania. The symbiotic relationship between Sesbania rostrata and A. caulinodans lead to nitrogen fixing nodules in S. rostrata. Bacterial chemotaxis plays an important role in establishing this symbiotic relationship.Azorhizobium caulinodans is a genome and it contains chemotaxis gene clusters that are unique. It has five chemotaxis genes which are: cheW(1), cheW, cheA, cheR, and cheB. Azorhizobium caulinodans controls the movements of flagella, and the chemotaxis signaling path in Azorhizobium caulinodans helps with regulating biofilm formation.Bacteriaenazorhizobium caulinodans specie bacteria form nitrogenfixing symbiosis plant genus sesbania symbiotic relationship sesbania rostrata caulinodans lead nitrogen fixing nodule rostrata bacterial chemotaxis play important role establishing symbiotic relationshipazorhizobium caulinodans genome contains chemotaxis gene cluster unique five chemotaxis gene chew1 chew chea cher cheb azorhizobium caulinodans control movement flagellum chemotaxis signaling path azorhizobium caulinodans help regulating biofilm formation9257796.27173910.22222222311.50.387500(azorhizobium, caulinodans, specie, bacteria, form, nitrogenfixing, symbiosis, plant, genus, sesbania, symbiotic, relationship, sesbania, rostrata, caulinodans, lead, nitrogen, fixing, nodule, rostrata, bacterial, chemotaxis, play, important, role, establishing, symbiotic, relationshipazorhizobium, caulinodans, genome, contains, chemotaxis, gene, cluster, unique, five, chemotaxis, gene, chew1, chew, chea, cher, cheb, azorhizobium, caulinodans, control, movement, flagellum, chemotaxis, signaling, path, azorhizobium, caulinodans, help, regulating, biofilm, formation)[(Azorhizobium, ORG), (Sesbania, GPE), (Sesbania, GPE), (five, CARDINAL), (cheW, cheA, ORG), (cheR, ORG), (Azorhizobium, ORG), (Azorhizobium, ORG)][1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0]100020000005000000
66Azotobacter vinelandiiAzotobacter vinelandii is Gram-negative diazotroph that can fix nitrogen while grown aerobically. These bacteria are easily cultured and grown.\nA. vinelandii is a free-living N2 fixer known to produce many phytohormones and vitamins in soils. It produces fluorescent pyoverdine pigments.Bacteriaenazotobacter vinelandii gramnegative diazotroph fix nitrogen grown aerobically bacteria easily cultured grown vinelandii freeliving n2 fixer known produce many phytohormone vitamin soil produce fluorescent pyoverdine pigment4024966.2250006.66666722110.50.466667(azotobacter, vinelandii, gramnegative, diazotroph, fix, nitrogen, grown, aerobically, bacteria, easily, cultured, grown, vinelandii, freeliving, n2, fixer, known, produce, many, phytohormone, vitamin, soil, produce, fluorescent, pyoverdine, pigment)[(N2, CARDINAL)][1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]100000000000000000
77BacillusBacillus (Latin "stick") is a genus of Gram-positive, rod-shaped bacteria, a member of the phylum Bacillota, with 266 named species. The term is also used to describe the shape (rod) of certain bacteria; and the plural Bacilli is the name of the class of bacteria to which this genus belongs. Bacillus species can be either obligate aerobes: oxygen dependent; or facultative anaerobes: having the ability to continue living in the absence of oxygen. Cultured Bacillus species test positive for the enzyme catalase if oxygen has been used or is present.Bacillus can reduce themselves to oval endospores and can remain in this dormant state for years. The endospore of one species from Morocco is reported to have survived being heated to 420 °C. Endospore formation is usually triggered by a lack of nutrients: the bacterium divides within its cell wall, and one side then engulfs the other. They are not true spores (i.e., not an offspring). Endospore formation originally defined the genus, but not all such species are closely related, and many species have been moved to other genera of the Bacillota. Only one endospore is formed per cell. The spores are resistant to heat, cold, radiation, desiccation, and disinfectants. Bacillus anthracis needs oxygen to sporulate; this constraint has important consequences for epidemiology and control. In vivo, B. anthracis produces a polypeptide (polyglutamic acid) capsule that kills it from phagocytosis. The genera Bacillus and Clostridium constitute the family Bacillaceae. Species are identified by using morphologic and biochemical criteria. Because the spores of many Bacillus species are resistant to heat, radiation, disinfectants, and desiccation, they are difficult to eliminate from medical and pharmaceutical materials and are a frequent cause of contamination. Not only are they resistant to heat, radiation, etc., but they are also resistant to chemicals such as antibiotics. This resistance allows them to survive for many years and especially in a controlled environment. Bacillus species are well known in the food industries as troublesome spoilage organisms.Ubiquitous in nature, Bacillus includes symbiotic(sometimes referred to as endophytes) as well as independent species. Two parasitic pathogenic species are medically significant: B. anthracis causes anthrax; and B. cereus causes food poisoning.\nMany species of Bacillus can produce copious amounts of enzymes, which are used in various industries, such as in the production of alpha amylase used in starch hydrolysis and the protease subtilisin used in detergents. B. subtilis is a valuable model for bacterial research. \nSome Bacillus species can synthesize and secrete lipopeptides, in particular surfactins and mycosubtilins. Bacillus species are also found in marine sponges. Marine sponge associated Bacillus subtilis (strains WS1A and YBS29) can synthesize several antimicrobial peptides. These Bacillus subtilis strains can develop disease resistance in Labeo rohita.Bacteriaenbacillus latin stick genus grampositive rodshaped bacteria member phylum bacillota 266 named specie term also used describe shape rod certain bacteria plural bacillus name class bacteria genus belongs bacillus specie either obligate aerobe oxygen dependent facultative anaerobe ability continue living absence oxygen cultured bacillus specie test positive enzyme catalase oxygen used presentbacillus reduce oval endospore remain dormant state year endospore one specie morocco reported survived heated 420 c endospore formation usually triggered lack nutrient bacterium divide within cell wall one side engulfs true spore ie offspring endospore formation originally defined genus specie closely related many specie moved genus bacillota one endospore formed per cell spore resistant heat cold radiation desiccation disinfectant bacillus anthracis need oxygen sporulate constraint important consequence epidemiology control vivo b anthracis produce polypeptide polyglutamic acid capsule kill phagocytosis genus bacillus clostridium constitute family bacillaceae specie identified using morphologic biochemical criterion spore many bacillus specie resistant heat radiation disinfectant desiccation difficult eliminate medical pharmaceutical material frequent cause contamination resistant heat radiation etc also resistant chemical antibiotic resistance allows survive many year especially controlled environment bacillus specie well known food industry troublesome spoilage organismsubiquitous nature bacillus includes symbioticsometimes referred endophytes well independent specie two parasitic pathogenic specie medically significant b anthracis cause anthrax b cereus cause food poisoning many specie bacillus produce copious amount enzyme used various industry production alpha amylase used starch hydrolysis protease subtilisin used detergent b subtilis valuable model bacterial research bacillus specie synthesize secrete lipopeptides particular surfactins mycosubtilins bacillus specie also found marine sponge marine sponge associated bacillus subtilis strain ws1a ybs29 synthesize several antimicrobial peptide bacillus subtilis strain develop disease resistance labeo rohita4502552355.67111112.857143188.00.071404(bacillus, latin, stick, genus, grampositive, rodshaped, bacteria, member, phylum, bacillota, 266, named, specie, term, also, used, describe, shape, rod, certain, bacteria, plural, bacillus, name, class, bacteria, genus, belongs, bacillus, specie, either, obligate, aerobe, oxygen, dependent, facultative, anaerobe, ability, continue, living, absence, oxygen, cultured, bacillus, specie, test, positive, enzyme, catalase, oxygen, used, presentbacillus, reduce, oval, endospore, remain, dormant, state, year, endospore, one, specie, morocco, reported, survived, heated, 420, c, endospore, formation, usually, triggered, lack, nutrient, bacterium, divide, within, cell, wall, one, side, engulfs, true, spore, ie, offspring, endospore, formation, originally, defined, genus, specie, closely, related, many, specie, moved, genus, bacillota, one, ...)[(Latin, NORP), (Bacillota, ORG), (266, CARDINAL), (Bacilli, PERSON), (years, DATE), (one, CARDINAL), (Morocco, GPE), (420 °, CARDINAL), (Bacillota, LOC), (Only one, CARDINAL), (vivo, GPE), (many years, DATE), (Two, CARDINAL), (Labeo, GPE)][5, 2, 0, 0, 3, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0]520030010101010000
88Bacillus anthracisBacillus anthracis is a Gram-positive and rod-shaped bacterium that causes anthrax, a deadly disease to livestock and, occasionally, to humans. It is the only permanent (obligate) pathogen within the genus Bacillus. Its infection is a type of zoonosis, as it is transmitted from animals to humans. It was discovered by a German physician Robert Koch in 1876, and became the first bacterium to be experimentally shown as a pathogen. The discovery was also the first scientific evidence for the germ theory of diseases.B. anthracis measures about 3 to 5 μm long and 1 to 1.2 μm wide. The reference genome consists of a 5,227,419 bp circular chromosome and two extrachromosomal DNA plasmids, pXO1 and pXO2, of 181,677 and 94,830 bp respectively, which are responsible for the pathogenicity. It forms a protective layer called endospore by which it can remain inactive for many years and suddenly becomes infective under suitable environmental conditions. Because of the resilience of the endospore, the bacterium is one of the most popular biological weapons. The protein capsule (poly-D-gamma-glutamic acid) is key to evasion of the immune response. It feeds on the heme of blood protein haemoglobin using two secretory siderophore proteins, IsdX1 and IsdX2.\n\nUntreated B. anthracis infection is usually deadly. Infection is indicated by inflammatory, black, necrotic lesion (eschar). The sores usually appear on the face, neck, arms, or hands. The fatal symptoms include flu-like fever, chest discomfort, diaphoresis, and body aches. The first animal vaccine against anthrax was developed by French chemist Louis Pasteur in 1881. Different animal and human vaccines are now available. The infection can be treated with common antibiotics such as penicillins, quinolones, and tetracyclines.Bacteriaenbacillus anthracis grampositive rodshaped bacterium cause anthrax deadly disease livestock occasionally human permanent obligate pathogen within genus bacillus infection type zoonosis transmitted animal human discovered german physician robert koch 1876 became first bacterium experimentally shown pathogen discovery also first scientific evidence germ theory diseasesb anthracis measure 3 5 μm long 1 12 μm wide reference genome consists 5227419 bp circular chromosome two extrachromosomal dna plasmid pxo1 pxo2 181677 94830 bp respectively responsible pathogenicity form protective layer called endospore remain inactive many year suddenly becomes infective suitable environmental condition resilience endospore bacterium one popular biological weapon protein capsule polydgammaglutamic acid key evasion immune response feed heme blood protein haemoglobin using two secretory siderophore protein isdx1 isdx2 untreated b anthracis infection usually deadly infection indicated inflammatory black necrotic lesion eschar sore usually appear face neck arm hand fatal symptom include flulike fever chest discomfort diaphoresis body ache first animal vaccine anthrax developed french chemist louis pasteur 1881 different animal human vaccine available infection treated common antibiotic penicillin quinolones tetracycline2741516225.53284712.4545452178.50.086905(bacillus, anthracis, grampositive, rodshaped, bacterium, cause, anthrax, deadly, disease, livestock, occasionally, human, permanent, obligate, pathogen, within, genus, bacillus, infection, type, zoonosis, transmitted, animal, human, discovered, german, physician, robert, koch, 1876, became, first, bacterium, experimentally, shown, pathogen, discovery, also, first, scientific, evidence, germ, theory, diseasesb, anthracis, measure, 3, 5, μm, long, 1, 12, μm, wide, reference, genome, consists, 5227419, bp, circular, chromosome, two, extrachromosomal, dna, plasmid, pxo1, pxo2, 181677, 94830, bp, respectively, responsible, pathogenicity, form, protective, layer, called, endospore, remain, inactive, many, year, suddenly, becomes, infective, suitable, environmental, condition, resilience, endospore, bacterium, one, popular, biological, weapon, protein, capsule, polydgammaglutamic, acid, key, ...)[(German, NORP), (Robert Koch, PERSON), (1876, DATE), (first, ORDINAL), (first, ORDINAL), (about 3, CARDINAL), (1, CARDINAL), (1.2 μm, QUANTITY), (5,227,419, CARDINAL), (two, CARDINAL), (181,677, CARDINAL), (94,830, CARDINAL), (many years, DATE), (two, CARDINAL), (IsdX1, ORG), (IsdX2, ORG), (first, ORDINAL), (French, NORP), (Louis Pasteur, PERSON), (1881, DATE)][7, 3, 0, 0, 0, 0, 0, 0, 0, 2, 3, 2, 0, 2, 0, 1, 0, 0]730000000232020100
99Bacillus cereusBacillus cereus is a Gram-positive, rod-shaped, facultatively anaerobic, motile, beta-hemolytic, spore-forming bacterium commonly found in soil, food and marine sponges. The specific name, cereus, meaning "waxy" in Latin, refers to the appearance of colonies grown on blood agar. Some strains are harmful to humans and cause foodborne illness, while other strains can be beneficial as probiotics for animals. The bacteria is classically contracted from fried rice dishes that have been sitting at room temperature for hours. B. cereus bacteria are facultative anaerobes, and like other members of the genus Bacillus, can produce protective endospores. Its virulence factors include phospholipase C, cereulide, sphingomyelinase, metalloproteases, and cytotoxin K.The Bacillus cereus group comprises seven closely related species: B. cereus sensu stricto (referred to herein as B. cereus), B. anthracis, B. thuringiensis, B. mycoides, B. pseudomycoides, and B. cytotoxicus; or as six species in a Bacillus cereus sensu lato: B. weihenstephanensis, B. mycoides, B. pseudomycoides, B. cereus, B. thuringiensis, and B. anthracis.Bacteriaenbacillus cereus grampositive rodshaped facultatively anaerobic motile betahemolytic sporeforming bacterium commonly found soil food marine sponge specific name cereus meaning waxy latin refers appearance colony grown blood agar strain harmful human cause foodborne illness strain beneficial probiotic animal bacteria classically contracted fried rice dish sitting room temperature hour b cereus bacteria facultative anaerobe like member genus bacillus produce protective endospore virulence factor include phospholipase c cereulide sphingomyelinase metalloproteases cytotoxin kthe bacillus cereus group comprises seven closely related specie b cereus sensu stricto referred herein b cereus b anthracis b thuringiensis b mycoides b pseudomycoides b cytotoxicus six specie bacillus cereus sensu lato b weihenstephanensis b mycoides b pseudomycoides b cereus b thuringiensis b anthracis159967226.0817617.2272732147.0-0.091667(bacillus, cereus, grampositive, rodshaped, facultatively, anaerobic, motile, betahemolytic, sporeforming, bacterium, commonly, found, soil, food, marine, sponge, specific, name, cereus, meaning, waxy, latin, refers, appearance, colony, grown, blood, agar, strain, harmful, human, cause, foodborne, illness, strain, beneficial, probiotic, animal, bacteria, classically, contracted, fried, rice, dish, sitting, room, temperature, hour, b, cereus, bacteria, facultative, anaerobe, like, member, genus, bacillus, produce, protective, endospore, virulence, factor, include, phospholipase, c, cereulide, sphingomyelinase, metalloproteases, cytotoxin, kthe, bacillus, cereus, group, comprises, seven, closely, related, specie, b, cereus, sensu, stricto, referred, herein, b, cereus, b, anthracis, b, thuringiensis, b, mycoides, b, pseudomycoides, b, cytotoxicus, six, specie, bacillus, cereus, ...)[(Latin, LANGUAGE), (hours, TIME), (cereulide, ORG), (sphingomyelinase, ORG), (seven, CARDINAL), (six, CARDINAL), (B. weihenstephanensis, PERSON)][2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 1, 0]200001000002010010

Last rows

df_indexNameDescriptionTypelangDescription_cleanword_countchar_countsentence_countavg_word_lengthavg_sentence_lengthname_word_countname_char_countname_avg_word_lengthPolarityparsedentity_tagsentity_typesCARDINALDATEEVENTFACGPELANGUAGELAWLOCMONEYNORPORDINALORGPERCENTPERSONPRODUCTQUANTITYTIMEWORK_OF_ART
16944Staphylococcus virus G1Staphylococcus virus G1 is a virus of the family Herelleviridae, genus Kayvirus.As a member of the group I of the Baltimore classification, Staphylococcus virus G1 is a dsDNA virus. All the family Herelleviridae members share a nonenveloped morphology consisting of a head and a tail separated by a neck. Its genome is linear. The propagation of the virions includes the attaching to a host cell (a bacterium, as Staphylococcus virus G1 is a bacteriophage) and the injection of the double stranded DNA; the host transcribes and translates it to manufacture new particles. To replicate its genetic content requires host cell DNA polymerases and, hence, the process is highly dependent on the cell cycle.The Gp67 protein of G1 has been found to interact with its host's RNA polymerase though an interaction with a sigma factor.The phage contains a genome of 138,715 base pairs with a 30.4% of GC content and 214 predicted genes; this means that the 88.5% of the DNA is coding open reading frames, and therefore the gene density (the number of genes per kilobase) is 1.54.\n\n\n== References ==Bacteriophageenstaphylococcus virus g1 virus family herelleviridae genus kayvirusas member group baltimore classification staphylococcus virus g1 dsdna virus family herelleviridae member share nonenveloped morphology consisting head tail separated neck genome linear propagation virion includes attaching host cell bacterium staphylococcus virus g1 bacteriophage injection double stranded dna host transcribes translates manufacture new particle replicate genetic content requires host cell dna polymerase hence process highly dependent cell cyclethe gp67 protein g1 found interact host rna polymerase though interaction sigma factorthe phage contains genome 138715 base pair 304 gc content 214 predicted gene mean 885 dna coding open reading frame therefore gene density number gene per kilobase 154 reference180909125.05000015.0000003217.000000-0.100727(staphylococcus, virus, g1, virus, family, herelleviridae, genus, kayvirusas, member, group, baltimore, classification, staphylococcus, virus, g1, dsdna, virus, family, herelleviridae, member, share, nonenveloped, morphology, consisting, head, tail, separated, neck, genome, linear, propagation, virion, includes, attaching, host, cell, bacterium, staphylococcus, virus, g1, bacteriophage, injection, double, stranded, dna, host, transcribes, translates, manufacture, new, particle, replicate, genetic, content, requires, host, cell, dna, polymerase, hence, process, highly, dependent, cell, cyclethe, gp67, protein, g1, found, interact, host, rna, polymerase, though, interaction, sigma, factorthe, phage, contains, genome, 138715, base, pair, 304, gc, content, 214, predicted, gene, mean, 885, dna, coding, open, reading, frame, therefore, gene, density, number, ...)[(Herelleviridae, PERSON), (Kayvirus, ORG), (Baltimore, GPE), (Herelleviridae, PERSON), (Gp67, PERSON), (G1, PRODUCT), (138,715, CARDINAL), (30.4%, PERCENT), (GC, ORG), (214, CARDINAL), (88.5%, PERCENT), (1.54, CARDINAL)][3, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 2, 3, 1, 0, 0, 0]300010000002231000
17045Streptomyces phage Φ0Streptomyces phage Φ0 is a bacteriophage that infects Streptomyces. It was discovered in 2016. The bacteriophage contains a double-stranded RNA genome and probably belongs to the Cystoviridae family.\n\n\n== References ==Bacteriophageenstreptomyces phage φ0 bacteriophage infects streptomyces discovered 2016 bacteriophage contains doublestranded rna genome probably belongs cystoviridae family reference3018946.3000007.5000003196.3333330.000000(streptomyces, phage, φ0, bacteriophage, infects, streptomyces, discovered, 2016, bacteriophage, contains, doublestranded, rna, genome, probably, belongs, cystoviridae, family, reference)[(Streptomyces, ORG), (Streptomyces, FAC), (2016, DATE), (Cystoviridae, PERSON)][0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0]010100000001010000
17146T4 rII systemThe T4 rII system is an experimental system developed in the 1950s by Seymour Benzer for studying the substructure of the gene. The experimental system is based on genetic crosses of different mutant strains of bacteriophage T4, a virus that infects the bacteria E. coli.Bacteriophageent4 rii system experimental system developed 1950s seymour benzer studying substructure gene experimental system based genetic cross different mutant strain bacteriophage t4 virus infects bacteria e coli4622744.93478311.5000003113.6666670.075000(t4, rii, system, experimental, system, developed, 1950s, seymour, benzer, studying, substructure, gene, experimental, system, based, genetic, cross, different, mutant, strain, bacteriophage, t4, virus, infects, bacteria, e, coli)[(T4, ORG), (the 1950s, DATE), (Seymour Benzer, PERSON), (T4, ORG)][0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0]010000000002010000
17247TectivirusTectiviridae is a family of viruses with 10 species in five genera. Bacteria serve as natural hosts. Tectiviruses have no head-tail structure, but are capable of producing tail-like tubes of ~ 60×10 nm upon adsorption or after chloroform treatment. The name is derived from Latin tectus (meaning 'covered').Bacteriophageentectiviridae family virus 10 specie five genus bacteria serve natural host tectiviruses headtail structure capable producing taillike tube 6010 nm upon adsorption chloroform treatment name derived latin tectus meaning covered4826055.4166679.60000011010.0000000.150000(tectiviridae, family, virus, 10, specie, five, genus, bacteria, serve, natural, host, tectiviruses, headtail, structure, capable, producing, taillike, tube, 6010, nm, upon, adsorption, chloroform, treatment, name, derived, latin, tectus, meaning, covered)[(Tectiviridae, GPE), (10, CARDINAL), (five, CARDINAL), (Latin, NORP)][2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]200010000100000000
17348Temperateness (virology)In virology, temperate refers to the ability of some bacteriophages (notably coliphage λ) to display a lysogenic life cycle. Many (but not all) temperate phages can integrate their genomes into their host bacterium's chromosome, together becoming a lysogen as the phage genome becomes a prophage. A temperate phage is also able to undergo a productive, typically lytic life cycle, where the prophage is expressed, replicates the phage genome, and produces phage progeny, which then leave the bacterium. With phage the term virulent is often used as an antonym to temperate, but more strictly a virulent phage is one that has lost its ability to display lysogeny through mutation rather than a phage lineage with no genetic potential to ever display lysogeny (which more properly would be described as an obligately lytic phage).Bacteriophageenvirology temperate refers ability bacteriophage notably coliphage λ display lysogenic life cycle many temperate phage integrate genome host bacterium chromosome together becoming lysogen phage genome becomes prophage temperate phage also able undergo productive typically lytic life cycle prophage expressed replicates phage genome produce phage progeny leave bacterium phage term virulent often used antonym temperate strictly virulent phage one lost ability display lysogeny mutation rather phage lineage genetic potential ever display lysogeny properly would described obligately lytic phage13369755.24060226.60000022311.5000000.309259(virology, temperate, refers, ability, bacteriophage, notably, coliphage, λ, display, lysogenic, life, cycle, many, temperate, phage, integrate, genome, host, bacterium, chromosome, together, becoming, lysogen, phage, genome, becomes, prophage, temperate, phage, also, able, undergo, productive, typically, lytic, life, cycle, prophage, expressed, replicates, phage, genome, produce, phage, progeny, leave, bacterium, phage, term, virulent, often, used, antonym, temperate, strictly, virulent, phage, one, lost, ability, display, lysogeny, mutation, rather, phage, lineage, genetic, potential, ever, display, lysogeny, properly, would, described, obligately, lytic, phage)[][0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]000000000000000000
17449Transduction (genetics)Transduction is the process by which foreign DNA is introduced into a cell by a virus or viral vector. An example is the viral transfer of DNA from one bacterium to another and hence an example of horizontal gene transfer. Transduction does not require physical contact between the cell donating the DNA and the cell receiving the DNA (which occurs in conjugation), and it is DNase resistant (transformation is susceptible to DNase). Transduction is a common tool used by molecular biologists to stably introduce a foreign gene into a host cell's genome (both bacterial and mammalian cells).Bacteriophageentransduction process foreign dna introduced cell virus viral vector example viral transfer dna one bacterium another hence example horizontal gene transfer transduction require physical contact cell donating dna cell receiving dna occurs conjugation dnase resistant transformation susceptible dnase transduction common tool used molecular biologist stably introduce foreign gene host cell genome bacterial mammalian cell9749555.10309319.40000022211.000000-0.137500(transduction, process, foreign, dna, introduced, cell, virus, viral, vector, example, viral, transfer, dna, one, bacterium, another, hence, example, horizontal, gene, transfer, transduction, require, physical, contact, cell, donating, dna, cell, receiving, dna, occurs, conjugation, dnase, resistant, transformation, susceptible, dnase, transduction, common, tool, used, molecular, biologist, stably, introduce, foreign, gene, host, cell, genome, bacterial, mammalian, cell)[][0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]000000000000000000
17550Viral plaqueA viral plaque is a visible structure formed after introducing a viral sample to a cell culture grown on some nutrient medium. The virus will replicate and spread, generating regions of cell destruction known as plaques. For example, Vero cell or other tissue cultures may be used to investigate an influenza virus or coronavirus, while various bacterial cultures would be used for bacteriophages.\nCounting the number of plaques can be used as a method of virus quantification. These plaques can sometimes be detected visually using colony counters, in much the same way as bacterial colonies are counted; however, they are not always visible to the naked eye, and sometimes can only be seen through a microscope, or using techniques such as staining (e.g. neutral red for eukaryotes or giemsa for bacteria) or immunofluorescence. Special computer systems have been designed with the ability to scan samples in batches.\n\nThe appearance of the plaque depends on the host strain, virus and the conditions. Highly virulent or lytic strains create plaques that look clear (due to total cell destruction), while strains that only kill a fraction of their hosts (due to partial resistance/lysogeny), or only reduce the rate of cell growth, give turbid plaques. Some partially lysogenic phages give bull's-eye plaques with spots or rings of growth in the middle of clear regions of complete lysis.Bacteriophageenviral plaque visible structure formed introducing viral sample cell culture grown nutrient medium virus replicate spread generating region cell destruction known plaque example vero cell tissue culture may used investigate influenza virus coronavirus various bacterial culture would used bacteriophage counting number plaque used method virus quantification plaque sometimes detected visually using colony counter much way bacterial colony counted however always visible naked eye sometimes seen microscope using technique staining eg neutral red eukaryote giemsa bacteria immunofluorescence special computer system designed ability scan sample batch appearance plaque depends host strain virus condition highly virulent lytic strain create plaque look clear due total cell destruction strain kill fraction host due partial resistancelysogeny reduce rate cell growth give turbid plaque partially lysogenic phage give bullseye plaque spot ring growth middle clear region complete lysis2231170125.24663718.5833332115.5000000.020097(viral, plaque, visible, structure, formed, introducing, viral, sample, cell, culture, grown, nutrient, medium, virus, replicate, spread, generating, region, cell, destruction, known, plaque, example, vero, cell, tissue, culture, may, used, investigate, influenza, virus, coronavirus, various, bacterial, culture, would, used, bacteriophage, counting, number, plaque, used, method, virus, quantification, plaque, sometimes, detected, visually, using, colony, counter, much, way, bacterial, colony, counted, however, always, visible, naked, eye, sometimes, seen, microscope, using, technique, staining, eg, neutral, red, eukaryote, giemsa, bacteria, immunofluorescence, special, computer, system, designed, ability, scan, sample, batch, appearance, plaque, depends, host, strain, virus, condition, highly, virulent, lytic, strain, create, plaque, look, clear, due, ...)[(Vero, ORG)][0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]000000000001000000
17651Viral shuntThe viral shunt is a mechanism that prevents marine microbial particulate organic matter (POM) from migrating up trophic levels by recycling them into dissolved organic matter (DOM), which can be readily taken up by microorganisms. The DOM recycled by the viral shunt pathway is comparable to the amount generated by the other main sources of marine DOM.Viruses can easily infect microorganisms in the microbial loop due to their relative abundance compared to microbes. Prokaryotic and eukaryotic mortality contribute to carbon nutrient recycling through cell lysis. There is evidence as well of nitrogen (specifically ammonium) regeneration. This nutrient recycling helps stimulates microbial growth. As much as 25% of the primary production from phytoplankton in the global oceans may be recycled within the microbial loop through the viral shunt.Bacteriophageenviral shunt mechanism prevents marine microbial particulate organic matter pom migrating trophic level recycling dissolved organic matter dom readily taken microorganism dom recycled viral shunt pathway comparable amount generated main source marine domviruses easily infect microorganism microbial loop due relative abundance compared microbe prokaryotic eukaryotic mortality contribute carbon nutrient recycling cell lysis evidence well nitrogen specifically ammonium regeneration nutrient recycling help stimulates microbial growth much 25 primary production phytoplankton global ocean may recycled within microbial loop viral shunt12972485.61240316.1250002105.0000000.127778(viral, shunt, mechanism, prevents, marine, microbial, particulate, organic, matter, pom, migrating, trophic, level, recycling, dissolved, organic, matter, dom, readily, taken, microorganism, dom, recycled, viral, shunt, pathway, comparable, amount, generated, main, source, marine, domviruses, easily, infect, microorganism, microbial, loop, due, relative, abundance, compared, microbe, prokaryotic, eukaryotic, mortality, contribute, carbon, nutrient, recycling, cell, lysis, evidence, well, nitrogen, specifically, ammonium, regeneration, nutrient, recycling, help, stimulates, microbial, growth, much, 25, primary, production, phytoplankton, global, ocean, may, recycled, within, microbial, loop, viral, shunt)[(DOM, ORG), (DOM, ORG), (As much as 25%, PERCENT)][0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0]000000000002100000
17752Auxiliary metabolic genesAuxiliary metabolic genes (AMGs) are found in many bacteriophages but originated in bacterial cells. AMGs modulate host cell metabolism during infection so that the phage can replicate more efficiently. For instance, bacteriophages that infect the abundant marine cyanobacteria Synechococcus and Prochlorococcus (cyanophages) carry AMGs that have been acquired from their immediate host as well as more distantly-related bacteria. Cyanophage AMGs support a variety of functions including photosynthesis, carbon metabolism, nucleic acid synthesis and metabolism.\n\n\n== References ==Bacteriophageenauxiliary metabolic gene amgs found many bacteriophage originated bacterial cell amgs modulate host cell metabolism infection phage replicate efficiently instance bacteriophage infect abundant marine cyanobacteria synechococcus prochlorococcus cyanophages carry amgs acquired immediate host well distantlyrelated bacteria cyanophage amgs support variety function including photosynthesis carbon metabolism nucleic acid synthesis metabolism reference7650556.64473715.2000003237.6666670.525000(auxiliary, metabolic, gene, amgs, found, many, bacteriophage, originated, bacterial, cell, amgs, modulate, host, cell, metabolism, infection, phage, replicate, efficiently, instance, bacteriophage, infect, abundant, marine, cyanobacteria, synechococcus, prochlorococcus, cyanophages, carry, amgs, acquired, immediate, host, well, distantlyrelated, bacteria, cyanophage, amgs, support, variety, function, including, photosynthesis, carbon, metabolism, nucleic, acid, synthesis, metabolism, reference)[(Synechococcus, GPE), (Prochlorococcus, ORG)][0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]000010000001000000
17853WO virusWO virus is bacteriophage virus that infects bacteria of the genus Wolbachia, which it is named after. This virus is notable for carrying DNA related to the black widow spider toxin gene, becoming an example of a bacteriophage with animal-like DNA, implying DNA transfers between eukaryotes and bacteriophages.\n\n\n== References ==Bacteriophageenwo virus bacteriophage virus infects bacteria genus wolbachia named virus notable carrying dna related black widow spider toxin gene becoming example bacteriophage animallike dna implying dna transfer eukaryote bacteriophage reference5028035.60000016.666667273.5000000.195833(wo, virus, bacteriophage, virus, infects, bacteria, genus, wolbachia, named, virus, notable, carrying, dna, related, black, widow, spider, toxin, gene, becoming, example, bacteriophage, animallike, dna, implying, dna, transfer, eukaryote, bacteriophage, reference)[(Wolbachia, PERSON)][0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0]000000000000010000